CloudSpring | SQL or NoSQL: Google App Engine Part

Key Takeaways

Google App Engine Datastore is a schemaless object store that can handle data differently than a Relational Database Management System (RDBMS). It is infinitely scalable and offers solutions for many of RDBMS’s shortcomings, but it also comes with its own set of restrictions and different ways of modeling and building the data access layer of your application.
The Datastore only guarantees strong consistency for reads and “ancestor queries”, every other query will be eventually consistent. There are ways to trade performance for strong consistency, but sacrificing perfect consistency means there is no need to immediately synchronize all machines, a RDBMS would have to wait until every machine finishes updating the data.
Choosing between SQL and NoSQL datastore depends on your specific needs. If your application requires complex queries, multi-row transactions, and strict consistency, a SQL database like Google Cloud SQL may be the best choice. However, if your application needs to handle a variety of data types, requires high performance and scalability, a NoSQL database like Google Cloud Datastore or Firestore may be more suitable.

In the first part of this series, we looked at a relational databases and how NoSQL is different in comparison to relational databases. In this part we will look at “Google App Engine Datastore” which is one of options for storing your data with Google App Engine. Among other options first one is Google Cloud SQL – which is a relational DB in cloud, based on MySQL. Second option is Google Cloud Storage which is a storage service for storing files and objects of sizes upto terabytes.

Google App Engine Datastore

The Datastore is an infinitely scalable, schemaless object store, right at your disposal. It handles data quite a bit different than a RDBMS, which is why it provides solutions for many of RDBMS shortcomings. At the same time it comes with it’s own set of restrictions and different ways of modelling and building the data access layer of your application. Let’s look at some features of the Google App Engine Datastore.

Schemaless

The Datastore doesn’t require a fixed schema for your data. It’s an object store, you can throw objects at it and it’ll store them. Let’s talk in relation to a practical example. Let’s say we need to store business card information for an application. First name, last name and email address are mandatory fields, and there are optional fields like mobile number, LinkedIn URL, Twitter handle. Now when we are storing an entity of type “Business” we of course need to make sure to supply the mandatory fields, but optional fields can be stored only when they are available. So one entity might have twitter handle stored and another one might have twitter handle and mobile number. Let’s look at a JSON representation of object to understand the how entity looks like:

Entity 1

{

"firstName" : "John",
"LastName" : "Taylor",
"Email" : "john@gmail.com",
"twitter_id" : "john_t"
}

Entity 2

{

"firstName" : "Tom",
"LastName" : "Rogers",
"Email" : "trogers@gmail.com",
"twitter_id" : "trogers",
"mobileNumber" : "567 555 1256"
}

Let’s compare how this would be a modeled in a relational DB. A table with columns of all required and all optional fields will be created. For entities which don’t have any value for an optional field, a “null” will be filled. Any new field to be added to entity would mean change in table structure and populating value of that field for all entities.

Infinitely Scalable

You can store as much data in the datastore as you desire (leaving per GB cost aside for a moment), none of your queries will slow down. Fetching five entities from 50 is no different than fetching five entities from 50 million, performance wise. Query runtime will only increase with the size of your result-set and not the data-set to be scanned.

If you recall the discussion we had in part 1 of this series, you’ll quickly ask yourself how the Datastore is able to shard automatically, when RDBMS can’t. This amazing property is due to the way data is modeled inside the Datastore. Instead of spreading attributes across several relations, all information about a single entity is kept in one place. All entities are then ordered by their unique id. A simple algorithm can now split this list of entities by their ids and store the resulting shards on separate machines. The same algorithm can now be used to route every request to the appropriate machine.

Strong consistency vs. eventual consistency

The Datastore will only guarantee strong consistency for reads and “ancestor queries”, every other query will be eventually consistent. So there is a slight chance, that a user might not see the most up-to-date version of an entity, when it was very recently updated. This is not a big deal for a lot of use-cases (“Gosh! This Tweet only showed up now, when it was posted two seconds ago!”). There are ways to trade performance for strong consistency, but I won’t go into them here (take this as a starting point).

Sacrificing perfect consistency means there is no need to immediately synchronize all machines, a RDBMS would have to wait until every machine finishes updating the data. The Datastore also won’t check referential integrity, because that would mean having to read data from other machines to ensure a valid update. Another benefit of being able to scale horizontally is, that the Datastore is impressively fault-tolerant. Many machines in your data center could fail, and your data would most likely still be served without hesitation.

Modeling your data

In the datastore, you’ll have to model your data based on the queries you’ll want to run in your application. Because all the data for one entity has to be kept in one place, the Datastore will have to build an index for every query you need, before such queries can be served. This implies, that Ad-Hoc queries won’t be an option with the Datastore. App Engine provides (great) tools like MapReduce to crunch to huge datasets and perform analytical tasks, but such tasks will take significantly more time to implement than the elegant SQL statement you might be used to.

Choosing between SQL and NoSQL datastore

To summarize the discussion so far

“The Datastore provides a way to persist ‘dumb’ data, which the application turns into information, RDBMS provide a way to persist structured data, which the application can make use of directly”

Remember the discussion we had in part 1 around RDBMS needing information about the structure of your data to provide application-independent services like data aggregation on the database-level. The Datastore won’t do that. It only cares about the pieces of data it needs to build indexes, the rest of your entity is seen as a sealed blob of bytes.

Let’s look at some scenarios and use cases and try to evaluate if the RDBMS is a better solution or a NoSQL

Can a single server provide the performance we need? Maybe by utilizing caching? In this case RDBMS is the way to go. Look for example into Cloud SQL, AppEngine’s purely relational offering.
Do you plan on growing a multi-application environment working on the same dataset? Depending on your volume, RDBMS might be what you need, because it separates the database layer very strictly from the application.
Do you need Ad-Hoc queries? In terms of query flexibility, SQL is the clear winner.
Do you require perfect consistency? Even though there are ways to achieve strong consistency in the Datastore, it’s not what it was designed for. Again, RDBMS is the better choice.
Are you expecting millions of reads & writes per second? The Datastore provides automatic scaling to infinity and beyond, and it’s right there for you to use with AppEngine.
Do you need a simple, scalable way to persist entities with variable attributes? Even though you’ll have to handle consistency and data aggregation yourself, the schemaless Datastore should be what you need. And it’s integrated right into AppEngine, so it’s the ideal choice for quick prototypes with changing entities.

Choosing the right database is a vast topic, but with two great options at hand (CloudSQL and the Datastore), you at least won’t have to steer away from App Engine. I hope this article made the decision easier for you, and I wish you all the best with whatever great application you have in mind.

Final words

Choosing a right datastore for your application and understanding NoSQL, both are big topics. What we have covered in this two part series is a just the tip of an iceberg. Any missed out mentions of a major or emerging player in this space is purely inadvertent. We at CloudSpring hope to cover more of this great subject in near future, keep watching.

Frequently Asked Questions (FAQs) about SQL or NoSQL on Google App Engine

What are the key differences between SQL and NoSQL databases on Google App Engine?

SQL and NoSQL databases on Google App Engine have several key differences. SQL databases, such as Google Cloud SQL, are relational databases that use structured query language (SQL) for defining and manipulating the data. They are ideal for applications that require multi-row transactions with complex querying, strict consistency, and a fixed schema. On the other hand, NoSQL databases, like Google Cloud Datastore, are non-relational and do not require a fixed schema. They are perfect for applications that require scalability, high performance, and the ability to handle a variety of data types.

How does Google Cloud Datastore work?

Google Cloud Datastore is a highly scalable NoSQL database for web and mobile applications. It automatically handles sharding and replication, providing a highly reliable and durable database. Datastore provides a myriad of features such as ACID transactions, SQL-like queries, indexes, and much more. It’s designed to easily scale to handle large amounts of data across many Google Cloud servers.

What is the role of Google Cloud Firestore?

Google Cloud Firestore is a flexible, scalable NoSQL cloud database that can store and sync data for client- and server-side development. Firestore is designed to provide powerful querying, offline enabled SDKs, real-time updates, and easy integration with the rest of the Google Cloud platform.

How does Google Cloud Bigtable function?

Google Cloud Bigtable is a NoSQL big data database service that is designed for large analytical and operational workloads. It’s ideal for businesses that need to quickly store, analyze, and access large amounts of data. Bigtable is designed to seamlessly integrate with popular big data tools like Hadoop and Cloud Dataflow, and it supports the open-source, industry-standard HBase API.

What are the benefits of using NoSQL databases on Google Cloud Platform?

NoSQL databases on Google Cloud Platform offer several benefits. They provide high performance and scalability, making them ideal for handling large amounts of data. They also offer flexibility in handling a variety of data types, including structured, semi-structured, and unstructured data. Additionally, NoSQL databases on Google Cloud Platform are fully managed, meaning they handle tasks like sharding and replication automatically.

How can I choose between SQL and NoSQL for my Google App Engine project?

The choice between SQL and NoSQL for your Google App Engine project depends on your specific needs. If your application requires complex queries, multi-row transactions, and strict consistency, a SQL database like Google Cloud SQL may be the best choice. However, if your application needs to handle a variety of data types, requires high performance and scalability, a NoSQL database like Google Cloud Datastore or Firestore may be more suitable.

Can I use both SQL and NoSQL databases in my Google App Engine project?

Yes, it’s possible to use both SQL and NoSQL databases in your Google App Engine project. This approach is known as polyglot persistence. It allows you to use the right database for the right job, taking advantage of the strengths of each type of database.

What is the pricing for Google Cloud SQL, Datastore, Firestore, and Bigtable?

The pricing for Google Cloud SQL, Datastore, Firestore, and Bigtable varies based on several factors, including the amount of storage used, the number of reads and writes, and the region in which your data is stored. Detailed pricing information can be found on the Google Cloud Pricing page.

How secure are Google Cloud databases?

Google Cloud databases are designed with multiple layers of security to protect your data. This includes encryption at rest and in transit, identity and access management features, and network security measures. Google also complies with various industry standards and regulations to ensure the privacy and protection of your data.

How can I migrate my existing database to Google Cloud?

Google Cloud provides several tools and services to help you migrate your existing database to the cloud. This includes the Cloud SQL import and export functionality for SQL databases, and the Datastore and Firestore import/export functionality for NoSQL databases. Google also offers the Cloud Data Transfer service for moving large amounts of data to Google Cloud.

SQL or NoSQL: Google App Engine Part – 2