CouchDb: document oriented persistence

If you’re looking for something “interesting” to mess around with, Damien Katz’s CouchDb project is at the point of working prototype, the server implemented in Erlang (a hot topic in some places) and a demo client application (a simple forum) in PHP.

Firing up the CouchDb server on Windows is a breeze – follow the README. PHP-wise, you need the new http extension which is most easily done on Win32 by grabbing the most recent PHP 5 release (5.1.6) and the corresponding collection of PECL modules. Alternatively the most recent XAMPP (apparently) packs the extension.

The interface between CouchDb and PHP is REST – XML + HTTP – you can also point your browser directly at the CouchDb server (default – localhost:8080) and get around with a little help from the CouchDb wiki.

What is CouchDb and why is CouchDb interesting, given relational DBs etc? To an extent it’s hard to define – best starting point is probably Damien’s discussion of Document Oriented Development. There’s a quick overview here but still it’s difficult to find a truly selling argument. How about some code instead? Here’s a snippet from the demo app (couchthread2.php), which is handling a form post;


    if ($_SERVER['REQUEST_METHOD'] == 'POST') {    
        // someone is creating a new response    
        
        // Set a field named Type to "response". This is a simple
        // way to identify the "Type" of the document. (but we could
        // have as easily used Form, Class, Category etc as a field
        // name)
        $_POST['Type'] = "response";
        
        // Add a creation date, and use a format that will sort correctly as text
        $_POST['CreationDate'] = date(DATE_ATOM);
        
        // add the threadid from query arg
        $_POST['threadid'] = $_GET['threadid'];
    
        // just take all the posted fields and save them as a new document
        if (couch_create_doc('http://localhost:8888/couchtest/', $_POST)) {    
            header('Location: ' . $_SERVER['REQUEST_URI']); // reload the page
            exit;
        }
    }

Let’s just zoom in there on that last part…


        if (couch_create_doc('http://localhost:8888/couchtest/', $_POST)) {    
            header('Location: ' . $_SERVER['REQUEST_URI']); // reload the page
            exit;
        }

…just pass the $_POST (at least for this simple example). Getting interested yet? And how about that reverse proxy between PHP and the db(s) that’s making load balancing transparent?

From what I’ve seen in Dokuwiki, where wiki pages are stored directly, as-is, on the filesystem, there’s a lot to be said for keeping the “raw resources” in a form that makes them easy to identify. Working out the last modification time (caching), replication / mirroring, administration and a whole host of other stuff gets much easier to manage, vs. a relational database where what constitutes a complete “document” may be spread across multiple tables. Of course the downside is stuff like searching, sorting and relations gets harder – enter CouchDb where (if I’ve understood right) you can “compile” tables from the contents of your raw documents using it’s fabric formula language. Assuming the processing done to create the tables is reproducible, replicating databases across systems would then “only” be a matter of copying the raw documents.

The other side of this is what Erlang enables – designed for telephone switches and functional programming language means no squirrels. A worthwhile read to help things click is Functional Programming For The Rest of Us

A functional program is ready for concurrency without any further modifications. You never have to worry about deadlocks and race conditions because you don’t need to use locks! No piece of data in a functional program is modified twice by the same thread, let alone by two different threads. That means you can easily add threads without ever giving conventional problems that plague concurrency applications a second thought!

…and stuff like transactions (apparently) gets easier with functional programming – no awkward state hanging around after you rollback.

Anyway – one to watch I think.

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • http://www.phppatterns.com HarryF
  • Damien Katz

    Great writeup Harry.

    enter CouchDb where (if I’ve understood right) you can “compile” tables from the contents of your raw documents using it’s fabric formula language. Assuming the processing done to create the tables is reproducible, replicating databases across systems would then “only” be a matter of copying the raw documents.

    Yup, nailed it, that’s pretty much exactly how it meant to work.

    Over the next few weeks I’m going to cook up a few more demos to what kinds of applications can be built on this system.

  • http://www.errewf.it RaS!

    Looks not so useful….

  • http://www.phppatterns.com HarryF

    Great writeup Harry

    Many thanks.

    Looks not so useful…

    As a prototype, understood. But as a project – where it’s headed strikes me as very useful.

    For web developers, being able to replicate databases easily, over HTTP, is a big one. Imagine being able to get yourself 10 cheap hosting accounts from different companies, of the $5/month variety, and running multiple replicating copies of your database in parallel on each, with one more expensive account running a reverse proxy in front of all 10 – assuming the latency between the reverse proxy and the backend servers is not too significant, it’s a cheap to run a fault tolerant site. It’s worth listening to the podcast to hear Damiens thoughts on offline storage.

    Also, as the above example with $_POST attempts to point out, being document oriented may simplify development of certain categories of application. Given that the web is primarily resource / document oriented, using a database engine that reflects that means creating / updating might be as simple as passing through a POST. Much of this ORM / ActiveRecord “joy” falls away.

    Another potential benefit would be simplifying search – as all documents are effectively a single “unit” of data, they’re easier to index – vs. building a search index out of multiple columns of multiple tables.

    To an extent I guess you could say, compared to relational DBs, CouchDb is storing data in a de-normalized form, from which a normalized form can be generated, but the issue of normalization becomes transparent to the code creating / updating the data, hence simplification. That also probably means you can do most writes in a single operation – reduced need for transactions.

  • Damien Katz

    Wow Harry, I should get you to write CouchDb promo. You get it and really do a great job of explaining this stuff. Better than I can.

    Thanks again.

  • webdevguy

    This sounds VERY much like the attributes of Lotus Notes but a lot less expensive.