SQL vs NoSQL: How to Choose

Key Takeaways

SQL databases are ideal for projects with well-defined, related data requirements and where data integrity is critical. They are often used for online stores and banking systems. NoSQL databases are better suited for projects with unrelated, evolving or indeterminate data requirements, where speed and scalability are key. They are commonly used for social networks, customer management and web analytics systems.
NoSQL databases offer flexibility in data storage, allowing for the addition or removal of fields at will. They store all data about an individual in a single document, making data retrieval and full-text search simpler. However, they do not implement data integrity rules or support transactions across multiple documents.
SQL databases are necessary for projects requiring robust data integrity and transaction support, like a warehouse management system. They store related data in tables, require a schema prior to use, and support table JOINs. However, the schema is rigid and the data can become fragmented, making it challenging for developers or system administrators to examine the database.

In the previous article we discussed the primary differences between SQL and NoSQL databases. In this follow-up, we’ll apply our knowledge to specific scenarios and determine the best option.

To recap:

SQL databases:

store related data in tables
require a schema which defines tables prior to use
encourage normalization to reduce data redundancy
support table JOINs to retrieve related data from multiple tables in a single command
implement data integrity rules
provide transactions to guarantee two or more updates succeed or fail as an atomic unit
can be scaled (with some effort)
use a powerful declarative language for querying
offer plenty of support, expertise and tools.

NoSQL databases:

store related data in JSON-like, name-value documents
can store data without specifying a schema
must usually be denormalized so information about an item is contained in a single document
should not require JOINs (presuming denormalized documents are used)
permit any data to be saved anywhere at anytime without verification
guarantee updates to a single document — but not multiple documents
provide excellent performance and scalability
use JSON data objects for querying
are a newer, exciting technology.

SQL databases are ideal for projects where requirements can be determined and robust data integrity is essential. NoSQL databases are ideal for unrelated, indeterminate or evolving data requirements where speed and scalability are more important. In simpler terms:

SQL is digital. It works best for clearly defined, discrete items with exact specifications. Typical use cases are online stores and banking systems.
NoSQL is analog. It works best for organic data with fluid requirements. Typical use cases are social networks, customer management and web analytics systems.

Few projects will be an exact fit. Either option could be viable if you have shallower or naturally denormalized data. But please be aware these simplified example scenarios with sweeping generalizations! You know more about your project than I do, and I wouldn’t recommend switching from SQL to NoSQL or vice versa unless it offers considerable benefits. It’s your choice. Consider the pros and cons at the start of your project and you can’t go wrong.

Scenario One: a Contact List

Let’s re-invent the wheel and implement an SQL-based address book system. Our initial naive contact table is defined with the following fields:

id
title
firstname
lastname
gender
telephone
email
address1
address2
address3
city
region
zipcode
country

Problem one: few people have a single telephone number. We probably need at least three for land-line, mobile and workplace, but it doesn’t matter how many we allocate — someone, somewhere will want more. Let’s create a separate telephone table so contacts can have as many as they like. This also normalizes our data — we don’t need a NULL for contacts without a number:

contact_id
name (text such as land-line, work mobile, etc.)
number

Problem two: we have the same issue with email addresses, so let’s create a similar email table:

contact_id
name (text such as home email, work email, etc.)
address

Problem three: we may not wish to enter a (geographic) address, or we may want to enter multiple addresses for work, home, holiday homes, etc. We therefore need a new address table:

contact_id
name (text such as home, office, etc.)
address1
address2
address3
city
region
zipcode
country

Our original contact table has been reduced to:

id
title
firstname
lastname
gender

Great — we have a normalized database which can store any number of telephone numbers, email addresses and addresses for any contact. Unfortunately …

The schema is rigid
We’ve not considered the contact’s middle name(s), date of birth, company or job role. It doesn’t matter how many fields we add, we’ll soon receive update requests for notes, anniversaries, relationship statuses, social media accounts, inside leg measurements, favorite type of cheese etc. It’s impossible to foresee every option, so we’d possibly create an otherdata table with name-value pairs to cope.

The data is fragmented
It’s not easy to for developers or system administrators to examine the database. The program logic will also become slower and more complex, because it’s not practical to retrieve a contact’s data in a single SELECT statement with multiple JOIN clauses. (You could, but the result would contain every combination of telephone, email and address: if someone had three telephone numbers, five emails and two addresses, the SQL query would generate thirty results.)

Finally, full-text search is difficult. If someone enters the string “SitePoint”, we must check all four tables to see if it’s part of a contact name, telephone, email or address and rank the result accordingly. If you’ve ever used WordPress’s search, you’ll understand how frustrating that can be.

The NoSQL Alternative

Our contact data concerns people. They are unpredictable and have differing requirements at different times. The contact list would benefit from using a NoSQL database, which stores all data about an individual in a single document in the contacts collection:



{

  name: [

    "Billy", "Bob", "Jones"

  ],

  company: "Fake Goods Corp",

  jobtitle: "Vice President of Data Management",

  telephone: {

    home: "0123456789",

    mobile: "9876543210",

    work: "2244668800"

  },

  email: {

    personal: "bob@myhomeemail.net",

    work: "bob@myworkemail.com"

  },

  address: {

    home: {

      line1: "10 Non-Existent Street",

      city: "Nowhere",

      country: "Australia"

    }

  },

  birthdate: ISODate("1980-01-01T00:00:00.000Z"),

  twitter: '@bobsfakeaccount',

  note: "Don't trust this guy",

  weight: "200lb",

  photo: "52e86ad749e0b817d25c8892.jpg"

}

In this example, we haven’t stored the contact’s title or gender, and we’ve added data which need not apply to anyone else. It doesn’t matter — our NoSQL database won’t mind, and we can add or remove fields at will.

Because the contact’s data is contained in a single document, we can retrieve some or all information using a single query. A full-text search is also simpler; in MongoDB we can define an index on all contact text fields using:



db.contact.createIndex({ "$**": "text" });

then perform a full-text search using:



db.contact.find({

  $text: { $search: "something" }

});

A social network may use similar contact data stores, but it expands on the feature set with options such as relationship links, status updates, messaging and “likes”. These facilities may be implemented and be dropped in response to user demand — it’s impossible to predict how they will evolve.

In addition:

Most data updates have a single point of origin: the user. It’s unlikely we’ll need to update two or more records at any one time, so transaction-like functionality is not required.
Despite what some users may think, a failed status update is unlikely to cause a global meltdown or financial loss. The application’s interface and performance take a higher priority than robust data integrity.

NoSQL appears to be a good fit. The database allows us to quickly implement features storing different types of data. For example, all the user’s dated status updates could be placed in a single document in the status collection:



{

  user_id: ObjectID("65f82bda42e7b8c76f5c1969"),

  update: [

    {

      date: ISODate("2015-09-18T10:02:47.620Z"),

      text: "feeling more positive today"

    },

    {

      date: ISODate("2015-09-17T13:14:20.789Z"),

      text: "spending far too much time here"

    }

    {

      date: ISODate("2015-09-17T12:33:02.132Z"),

      text: "considering my life choices"

    }

  ]

}

While this document could become long, we can fetch a subset of the array, such as the most recent update. The whole status history for every user can also be searched quickly.

Now presume we wanted to introduce an emoticon choice when posting an update. This would be a matter of adding a graphic reference to new entries in the update array. Unlike an SQL store, there’s no need to set previous message emoticons to NULL — our program logic can show a default or no image if an emoticon isn’t set.

Scenario Three: a Warehouse Management System

Consider a system which monitors warehoused goods. We need to record:

products arriving at the warehouse and being allocated to a specific location/bay
movements of goods within the warehouse, e.g. rearranging stock so the same products are in adjacent locations
orders and the subsequent removal of products from the warehouse for delivery.

Our data requirements:

Generic product information such as box quantities, dimensions and color can be stored, but it’s discrete data we can identify and apply to anything. We’re unlikely to be concerned with specifics, such as laptop processor speed or estimated smartphone battery life.
It’s imperative to minimize mistakes. We can’t have products disappearing or being moved to a location where different products are already being stored.
In its simplest form, we’re recording the transfer of items from one physical area to another — or removing from location A and placing in location B. That’s two updates for the same action.

We need a robust store with enforced data integrity and transaction support. Only an SQL database will (currently) satisfy those requirements.

Expose Yourself!

I hope these scenarios help, but every project is different and, ultimately, you need to make your own decision. (Although, we developers are adept at justifying our technological choices, regardless of how good they are!)

The best advice: expose yourself to as many technologies as possible. That knowledge will allow you to make a reasoned and emotionally impartial judgment regarding SQL or NoSQL. Best of luck.

Frequently Asked Questions (FAQs) about SQL vs NoSQL

What are the key differences between SQL and NoSQL databases?

SQL and NoSQL databases differ in several ways. SQL databases are relational, meaning they organize data in tables and rows. They use structured query language (SQL) for defining and manipulating the data. On the other hand, NoSQL databases are non-relational and can store data in several ways: document-based, column-based, graph-based or key-value pairs. They are particularly useful for working with large sets of distributed data.

When should I use SQL over NoSQL?

SQL databases are a good choice when you have a clear schema and the data integrity is of utmost importance. They are also beneficial when you need to perform complex queries. SQL databases are ACID compliant ensuring reliable transactions.

When is NoSQL a better choice than SQL?

NoSQL databases are a better choice when you need to store large volumes of data or when the data structure is unclear or changing over time. They provide flexibility, scalability, and speed, making them a good choice for real-time applications and big data.

Can SQL and NoSQL coexist in the same project?

Yes, SQL and NoSQL can coexist in the same project. This is known as a polyglot persistence architecture. The choice of database depends on the specific requirements of each part of your application.

What are some popular SQL and NoSQL databases?

Popular SQL databases include MySQL, Oracle, and PostgreSQL. Popular NoSQL databases include MongoDB, Cassandra, and Redis.

How does data modeling differ between SQL and NoSQL?

In SQL databases, data is typically modeled using a structured, schema-based approach with tables and relationships. In NoSQL databases, data modeling can be done in various ways depending on the type of NoSQL database: document, key-value, columnar, or graph.

How do SQL and NoSQL handle scalability?

SQL databases typically scale vertically by adding more powerful hardware, while NoSQL databases scale horizontally by adding more servers to handle more traffic.

What is CAP theorem and how does it apply to SQL and NoSQL?

CAP theorem states that it’s impossible for a distributed data store to simultaneously provide more than two out of the following three guarantees: Consistency, Availability, and Partition tolerance. SQL databases prioritize consistency and availability, while NoSQL databases prioritize availability and partition tolerance.

How do SQL and NoSQL handle transactions?

SQL databases use ACID (Atomicity, Consistency, Isolation, Durability) properties for transactions. NoSQL databases, on the other hand, do not typically provide all ACID properties. Instead, they focus on the BASE (Basically Available, Soft state, Eventually consistent) model.

What are some use cases for SQL and NoSQL?

SQL databases are ideal for applications requiring multi-row transactions like accounting systems or systems that require complex queries. NoSQL databases are ideal for applications needing to handle large amounts of data and scale horizontally like content management systems, real-time analytics, and IoT applications.