Watch out for CouchDB

The CouchDB CouchI’ve been watching the CouchDB database project since it has started. It’s the pet project of programmer Damien Katz. I was excited to read an update today announcing that it will be accessible via a RESTful JSON API and the query language for CouchDB will be JavaScript. How cool is that?!

CouchDB is not a relational database but rather a distributed document database. Instead of inserting a row of column values into a table, you save a document, with any number of named fields and values (now represented as JSON objects) into the database where it exists in a kind of addressable pool. Documents can be created, retrieved, updated and deleted without having to worry about schema-design as there is none.

If you do require a little structure you can create views. A view is a dynamic structure that acts like a search query, providing a virtual table of documents matching the query. The query, previously expressed in a proprietary language, is now a JavaScript function which is used to determine which documents to include in the view. Because the views are completely virtual you can have as many as you like and you can add or remove them at any time without touching any of your data. Views are indexed and regularly updated to keep tabs with the state of the documents in the database.

CouchDB has some other impressive attributes; it’s fully ACID compliant, has a security model built in, bi-directional incremental replication and conflict resolution.

But what is it good for? Well the documentation is quick to point out this is not a replacement for relational databases but it is well suited to document style applications like blogs, document management, bug tracking, forums and so on. The CouchDB wiki says:

With very little database work, it is possible to build a distributed document management application with granular security and full revision histories.

CouchDB is still only in Alpha, but there are already demos available and client libraries written in PHP, Ruby and others. You can also download the source code under the GPL license.

I think CouchDB will be one to watch!

Win an Annual Membership to Learnable,

SitePoint's Learning Platform

  • Jan

    Andrew, thanks for the superb write-up!

  • http://www.realityedge.com.au mrsmiley

    Well the documentation is quick to point out this is not a replacement for relational databases but it is well suited to document style applications like blogs, document management, bug tracking, forums and so on

    Brilliant! So for most web developers that leaves what tasks for relational db’s? Heavy traffic’ed versions of those apps perhaps? Most web devs only work with content management, blogging or forums software anyway, so I’m not seeing how this isn’t being positioned as a potential replacement in similar style to how SQLite was.

  • http://tetlaw.id.au atetlaw

    @Jan, you’re welcome!
    @mrsmiley, exactly. Because of it’s design it’d be a very good performer for those kinds of web apps, and quick to implement. The replication and conflict resolution features would also make it an excellent candidate for a Google Gears/offline client type application.

    I also dig the free-form data aspect. In a rdbms it’s often the case you have to be attentive to your schema design so that you can support future query requirements. In CouchDB you just make views whenever you feel like to give you any number of different views on the same data.

    I’ve noticed a few blog posts recently about ‘de-normalisation’ – so it seems other people are also seeing the need to relax the ‘relational’ part of database design for web apps that are mainly collections of documents.

  • http://www.realityedge.com.au mrsmiley

    The need for de-normalisation, particularly in document management is due to volume more than anything else. I work with a few DB’s that are several terabytes in size each, and the only way to get the performance increases we need is to de-normalise data by splitting it into tables of similarly grouped data (eg. date based).

    I might be missing something, but I’m pretty sure there is no performance information on their website. Any idea how quick the queries are?

  • Jan

    @mrsmiley It’d be quite hard to give out any numbers that make much sense. From the architecture point of view though, a view on a table is much like a (multi-column) index on a table in an RDBMS that _just_ performs a quick look-up. So this theoretically should be pretty quick. Of course, we haven’t done any profiling and optimization yet, but the demos are pretty speedy running on a not that big machine (http://demos.couchdb.com).

    The major advantage of the architecture is, however, that it is designed for high traffic. No locking occurs is the storage module (MVCC for the win) allowing any number of parallel readers as well as serialized writes. With replication, you can even set up multiple machines for a horizontal scale-out and data partitioning (in the future) will let you cope with huge volumes of data. (See http://jan.prima.de/~jan/plok/archives/72-Some-Context.html slide 13 for more on the storage module or the whole post for detailed info in general).

    For de-normalization, yeah, this is what you have got to do if you require any performance. So CouchDb does that by default :-)

  • Smith

    Very cool idea

  • Mike Borozdin

    I see no point in this. Document management can still be done with any relational databases and I bet this will work much faster than with CouchDb that seems to be more a technology prospect now than an actual implementation, because if the idea proves valuable it will be takenover by huge databases producers.

    The need for de-normalisation, particularly in document management is due to volume more than anything else. I work with a few DB’s that are several terabytes in size each, and the only way to get the performance increases we need is to de-normalise data by splitting it into tables of similarly grouped data (eg. date based).

    This doesn’t prove that it will work any faster with CouchDB at least with this implementation.

  • Jan Lehnardt

    @Mike CouchDb is not meant to replace relational databases. It uses just a different approach to storing and retrieving data that might or might not suit your requirements, beliefs or preconceptions.

    …and I bet this will work much faster than with CouchDb …

    is not much of an argument either, sorry :-)

    CouchDb is just another tool in your workbench.

  • Sam

    Sounds kind of like Lotus Notes. Notes was fine for getting data into it and working with it. However when you wanted to get the data out it was a massive pain.

  • Jan Lehnardt

    @Sam excellent. Damien, CouchDb’s author worked on the Notes core server. Go figure :-) And getting data out _is_ easy.

  • Daniel Lyons

    It doesn’t seem entirely fair to me to claim ACID compliance when you do not support connections. If I cannot stay connected, in what sense is my view of the data consistent and in what sense are my changes isolated to my connection? Furthermore, durability is a matter of implementation (is there any way that you could lose the data after you tell me my transaction is complete?) If compound operations of my devising cannot be bundled into atomic transactions, a claim of transaction support is extremely misleading. So I’d like to know in what sense CouchDB is ACID compliant, or even transactional. These properties are not automagically inherited from merely using Mnesia or Dets as your storage backend.

  • http://www.panesofglass.org/ aranwe

    I realize this may be a pretty stupid point to make, but how is this any different than just creating static html or using xml with xquery? I’m having a hard time seeing the point of building ‘yet another document storage format.’ Don’t we have enough to choose from yet?

    Also, while the openness of the format may work well with only one or a few people updating the documents, trying to help a large number of people create views into an undefined schema would be a nightmare. I am sure this will find a small following, but I can’t imagine this making huge waves.

  • Jan

    @Daniel CouchDb doesn’t use Mnesia or Dets but a custom storage module. Your view of the data remains consistent during you fetching or adding. Not having permanent connections doesn’t mean you don’t have connections. CouchDb’s storage module does all that is promised by ACID and there’s no way, other than disk failure, that CouchDb loses your data after it reports a safe write. Also, you’re the first to mention transactions here or anywhere in the CouchDb documentation.

  • Jan

    @aranwe CouchDb handles distribution, replication, conflict-detection, fast reporting, fulltext search, authentication, quick (as in concurrent) storage and a lot more. I don’t see where HTML or XML files give you all this. Above all, just convert your XML documents to JSON and dump them into CouchDb and you’re all fine. JSON is just waht CouchDb uses internally. This is a good thing because it keeps our code smaller an cleaner since we don’t have to do all the XML handling in the DB. You can still do that, though and JSON is an easy, yet complete, enough format to do so.

  • Damien Katz

    Daniel, CouchDb can support multi-document commit transactions. It’s trivial to implement, but doesn’t work with the distributed model. If you only think relationally, CouchDb doesn’t help you.

    CouchDb is transactional. Documents, when saved, are either all or nothing. You cannot have a half saved document, or a file attachment partially updated. File attachments are written concurrently, and then committed. If there is a conflict, the update is aborted. Compare this to regular updating a file on the file system read by multiple users.

    For, for acid compliance, CouchDb has all the properties:
    Atomic – Commits are all or nothing
    Consistent – Documents and views are always a complete representations of a point in time.
    Isolated – Writes by multiple concurrent updates aren’t seen by others until completely commited.
    Durable – the disk design of CouchDb is append-only storage which emoves the need for shutdown logic of fixup logic.