Google App Engine: Database Strategies
Traditional enterprize systems have well developed designs & strategies around relational databases. Relational database products are well developed not only from a data consistency point of view (ACID) but also from a database administration perspective. Databases are maintained in house or sourced to a competent vendor who has a well developed process for backup, migration and security.
In contrast to this, the emerging cloud technologies employ NO SQL database structure more often than not. The database is maintained in the cloud provider’s infrastructure. Database administration, especially backup procedures, are loosely defined & have not matured so far and are evolving over time. If a business is moving from a traditional relational DB to cloud based DB, the move can be quite overwhelming when taking into account all of these factors.
Knowing the right tools and knowledge of risks makes database strategy for cloud worth the benefits of clouds. Lets go through what are our options with GAE today. We wont go through development level details in this article, but am to understand the scenario at a high level.
Implementation Strategy: SQL & NO SQL!
If you are planing to move your existing application to GAE, and you have JPA/JDO layers in your application, the task is fairly easy. JPA/JDO interfaces let you store object data into database without having database specific code. It works with a different kind of database, thus making it easy to port your application to a different kind of storage. This strategy is good even for new application development, specially if you want to keep the option of moving back to a traditional model/database at a later point. There are certainly some limitations or “unsupported features” with GAE like “many to many relationships” or “joins in queries”, and you will have to account for model change if it’s a case of migration from existing app.
For a complete NO-SQL approach, you have more options. While you can use Google data-store directly with GAE provided APIs, you will have certain things to deal with which might mean extra work, for example, Google datastore deals with entities while your application might be using POJOs. Also, transaction management with the GAE datastore API is slightly complicated. You will also have to deal with untyped keys etc. But to our rescue, there are few frameworks which can help us like Objectify, twig and slim
Though Google has announced a SQL version of database to work with Google app engine, its still a “Labs” feature. In case you decide to go SQL, you would have simpler decision based on existing enterprise models. You would fit in an ORM framework using JDO/JPA or similar frameworks between database and application.
Migration & Backup
Migration frameworks for Google App Engine have not reached a desired level of maturity so far. One possibility is of course having custom programs which can run as “Task queue” and export data on a periodic basis to file which you can import in your DB. There are frameworks like AppRocket which offer DB backup in Python based applications and AppScale which implement, among others, GAE APIs and run as a virtual machine to offer various tasks.
But all above approaches sound unreliable once the size of database grows really big! And that’s a gap area which needs some work.
App engine offers it’s storage in two variants based on guarantee of availability and consistency. The high replication store provides higher read and write availability at the cost of higher latency writes and higher cost. Data is replicated across multiple data centres, but eventually most queries are consistent.
The master slave datastore option uses one master and one slave datastore. The data is replicated asynchronously and hence consistency is guaranteed. But since only one datastore is used, availability might be “read only” during issues in datastore or planned downtimes. In this case, you also need to design the application to be able to support “read-only-no-write” to datastore.
The Last Word: Performance
No topic on database is complete without talking about performance! Though a detailed treatment on performance would need a separate discussion. GAE datastore is not a traditional Relational DB and needs differently designed applications, different indexing techniques and most importantly a good understanding of what kind of store you are opting for.
For example, all entities being updated in a single transaction in master-slave store should be of same entity group, while this is not needed for high replication datastore. The design of application needs to be well thought from a different datastore perspective to leverage the scale and infrastructure!
Itachi Uchiha images from Shutterstock