SQL or NoSQL: Google App Engine – Part 1

NoSQL is a trending topic and pretty much everyone from Google to Facebook has some flavor of it. In this two part post we try to answer the dilemma of Sql or NoSQL. First part will explain benefits and mechanics of each plain old Sql and new shiny NoSQL. In second part we will specifically look at Google App Engine Datastore and try to answer if it’s a better choice to a given business problem

The Traditional RDBMS

It’s probably safe to say, that the majority of real-world applications rely on some type of RDBMS to store and retrieve their data. There might be plenty of reasons for your application to not choose NoSQL and stick to RDBMS instead. Let’s take a look at the key points RDBMS excel at:

  • Query flexibility
  • Maintaining consistency across the dataset
  • Managing transactions
  • Separation of concerns / Dealing with an ever evolving application(s) underneath it

In order to maintain a consistent dataset, the RDBMS enforces integrity constraints. Every action on the dataset has to take place within a transaction with ACID properties. This guarantees, that whatever happens inside the transaction will never break the consistency of your data.

Whatever new challenges your application might face, as long as your data-model stays the same, the RDBMS will happily process your queries. This separation between database and application is ideal for multiple application accessing the same database. The structure (the relations) in your data  is handled by the RDBMS. This of course requires complete knowledge of how your data is connected (provided by the relations you create), as well as strict compliance with the 12 principles of the relational model. These principles provide a solid theoretical foundation for simple and highly structured storage of (ideally) well-defined data.

A Disturbance in the Force

Unfortunately, all these great features come with a few downsides, to mention a significant few:

  • Entities with variable or complex attributes are not supported well
  • Weak support for hierarchical or graph data
  • No easy way to scale

First and second are both functional deficits: RDBMS need structure, and it can only structure what fits logically inside a relation. You could store complex attributes as binary strings for example, but the RDBMS won’t be able to operate efficiently on them. Variable attributes don’t go well with the static schema and every row is forced  to contain every attribute. Schema updates are slow and require scheduled downtime. From a “relational point of view”, this makes absolute sense. Remember, the RDBMS handles tasks that require knowledge about the structure of your data. It has to be “informed” whenever you intend to change that structure (by adding or removing attributes for example).

Relational databases are simply not great for hierarchical or graph data. These types of modelling require lot’s of one-to-many and many-to-many relationships, which can’t be modeled efficiently in a relational database. Figuratively speaking, you are trying to fit a tree or a mesh into what is essentially a table.

The third weakness is performance related. RDBMS can’t easily scale out horizontally, a huge (quite literally) problem for today’s multi-million user “web-scale” applications.

At a very basic level , databases scale by sharding. If one machine can’t handle the volume anymore, the dataset is split into subsets, shards, that can then be stored on multiple machines. A master-server handles load-balancing and routes each request to the appropriate machine (“slave”). This master-slave configuration is not the only one possible, but for the sake of brevity I won’t go into other models.

And this is the point, RDBMS can’t automatically shard data into subsets, because the information for one application-entity is (usually) stored across several database relations. If you are not familiar with relational data modelling, normalization is a good point to start. Yes, some giant web-scale applications like Twitter run on RDBMS, but developers have to implement an application-specific sharding layer, this is not automatically handled by the RDBMS.

A New Breed: NoSQL

NoSQL, “not only SQL” is a generic term used to describe a variety of databases, many of which are by no means new. But they’ve been experiencing a kind of renaissance lately. NoSQL databases aim to solve the performance problems of RDBMS by putting the structuring work back into the hands of the application programmer – You. Fortunately, applications don’t always require aggregation and structure at database level, which is why those apps can safely benefit from the vast performance improvements NoSQL databases can provide. Our goal will be to find out if your app is one of those, but first let me present a few NoSQL databases (roughly in order of complexity):

  • (Ordered) Key-Value stores (Apache Cassandra, Dynamo, Project Voldemort)
  • Object Stores (AppEngine Datastore)
  • Document stores (CouchDB)
  • Tree & Graph databases (Neo4J, Twitter’s FlockDB)

All of these solve some of the problems, and of course bring some of their own.

We now have a fairly solid understanding of mechanics and constraints of a relational and a NoSQL database. With that as a context we will take a look at Google App Engine datastore in next part of this post. Keep watching.

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • Hadi

    you forgot the biggest player!! MongoDB. how can one write an article about NOSQL and not mentioning MongoDB?

    I stopped reading this site after this article.
    weak

    • http://techtrace.wordpress.com Vishal Biyani

      Hadi, thanks for the comment. The list was never meant to be exhaustive – only indicative. MongoDB is definitely a major game changer, but not mentioning a DB does not make the article or content any weaker. Idea of article was to give a introduction to NOSQL and then actually talk about GAE- DataStore which IMHO is done well here!

  • Steven

    NoSQL is a new concept to me. What can you recommended I read to learn more about it

    • http://techtrace.wordpress.com Vishal Biyani

      I would highly recommend the book “NoSQL distilled”. It gives you a basic intro to various types of NoSQL without overwhelming you too much! It’s concise too.

  • Szabolcs Zajdo

    Above MongoDB, I would also recommend mentioning Memcache(d) and Redis. I think they are more commonly used key-value stores like Cassandra, which can only be used in very specific circumstances.

  • Szabolcs Zajdo

    Above MongoDB, I would also recommend to mention Memcache(d) and Redis. I think they are more commonly used key-value stores than Cassandra, which can only be used in very specific circumstances.

    • http://techtrace.wordpress.com Vishal Biyani

      Szabolcs, thanks a lot for your comment. I agree that some important players like Redis have been missed out of this discussion – completely unintentional! We would soon have a post detailing on NoSQL, and that time we will cover it all around NoSQL.
      Although I don’t see Memcache(d) as a NoSQL data store, IMHO the primary purpose there is caching and I would see that differently than a datastore. We also will have a post covering all memcache implementations soon!

  • Pingback: | SQL or NoSQL: Google App Engine Part - 2

  • Solburn

    To be fair, I know nearly nothing about NoSQL…

    “We now have a fairly solid understanding of mechanics and constraints of a relational and a NoSQL database.”

    I’m sorry man, but I actually don’t have a fairly solid understanding of mechanics and constraints of a NoSQL database. You went into good detail for a relational DB, but there were hardly any details about the mechanics or constraints of a NoSQL DB.

    Several NoSQL DB’s were mentioned and then you said, “All of these solve some of the problems, and of course bring some of their own.”

    I’m not sure what problems they solve or bring. Or am I missing something really basic here?

    I do look forward to reading the rest of this series of articles. Thanks!

    • http://techtrace.wordpress.com Vishal Biyani

      Solburn, Thanks for your comments. I think article has explained the concept of shard well, but other concepts of NoSQL could not be explained all in one article. We will soon have some posts on NoSQL DBs so we can cover and explain concepts in detail!