By David Bush

Beyond Rails Abstractions: A Dive into Database Internals

By David Bush

a dive into database internals

Abstractions are wonderful things. Take a look at Rails: we can achieve huge amounts of functionality with comparatively few lines of code written. I don’t necessarily need to know a great deal about how my database is implemented in order to be up and running quickly, for example.

The downside of this level of insulation from core functionality is that developers don’t learn all of the things they perhaps should. Only last week I was able to troubleshoot a slow query by removing the index because someone along the line didn’t understand cardinality or how an index worked. Could you implement a database yourself? Let’s dig into another piece of technology that we take for granted!

How do Indexes Work?

We expect relatively fast look up times when querying a database. If you were implementing a function to pull in every value between two numbers, you might be tempted to check each value to see if it met the criterion, but this would be prohibitively inefficient for large datasets, as it performs in linear time (O(n)). What you need is a way to store records in a specified order to facilitate quicker access times. A structure like a Binary Search Tree (BST) does exactly this, by ensuring that any given record is greater than all the records in its left subtree, but lower than all items in its right subtree. This brings our search time down to a much more respectable O(log n), and is the backbone of a database index.

Where a BST lets us down is that we seldom require a specific record. Rather, we’re more likely to want all records matching a particular criterion. A naive solution would be to check each record again, which brings us right back up to linear time, or O(n). What we need is a way to store all records in order, traverse that list and then stop when a condition is met (in this case, the upper bound of our query). A data structure well suited to this task is the B+ Tree, which brings access times down to O(m + log n).

While researching this article, I was struck by the inefficiency of this approach. Ostensibily, all that’s required is fast indexing of an ordered list — a task well suited to criminally under-used and little-known data structure, the Skip List. It turns out that, although B+ Trees typically requires more CPU work than equivalent operations in other data structures, they excel at requiring far fewer transfers from disk to memory, which is a natural bottleneck. This Google Blog post provides some context as to why this is. Additionally, B+ Trees are also worst-case efficient — a huge factor when datasets are larger.

Why Is Using Too Many Indexes a Bad Idea?

Let’s imagine removing a row in our database. This means we’d also need to remove the corresponding node in the B+ Tree. You can’t just remove the node, because we’re relying on the ordered nature of the data structure, which means we need to retain the order in addition to ensuring that the overall height of the tree is as low as it can be. To combat this, our database engine will leverage smart insertion and deletion methods, but these will take logarithmic (O(log n)) time. Crucially, any field you apply an index to has to rebalance the tree each time you perform an insert or delete operation.

A field with few possible options (such as boolean, trilean, etc.) is said to have low cardinality. Here is a great IBM article that warns against the pitfalls of indexing fields with low cardinality, but the gist of it is that you should avoid doing it. Anecdotally however, I’ve reduced queries by 80% before, simply by indexing a True/False column, so as always, profile profile profile!

Query Parser

Let’s take a standard query and follow its journey:

SELECT id, email, admin, credits FROM users WHERE credits > 2 + 3;

In a fashion not dissimilar to a compiler, a parser changes your text into a parse tree:

>>> SELECT email, admin, credits FROM users WHERE credits > 2 + 3                                                         
LOG:  parse tree:
  :commandType 1
  :querySource 0
  :canSetTag true
  :utilityStmt <>
  :resultRelation 0
  :hasAggs false
  :hasWindowFuncs false
  :hasSubLinks false
  :hasDistinctOn false
  :hasRecursive false
  :hasModifyingCTE false
  :hasForUpdate false
  :hasRowSecurity false
  :cteList <>
  :rtable (
    :alias <>
      :aliasname users
      :colnames ("id" "created_at" "updated_at" "email" "encrypted_password
      " "confirmation_token" "remember_token" "admin" "" "credits")
    :rtekind 0
    :relid 27043
    :relkind r
    :tablesample <>
    :lateral false
    :inh true
    :inFromCl true
    :requiredPerms 2
    :checkAsUser 0
    :selectedCols (b 12 16 18)
    :insertedCols (b)
    :updatedCols (b)
    :securityQuals <>
  Time: 0.001s

For those of you playing along at home, I’m using the excellent pgcli with Postgres and the following two settings:

SET client_min_messages=debug1;
SET debug_print_parse=on;

Back to the query parser. During this process, it checks that your input is legal. Keywords in the wrong order? Typo? It’s the query parser that kicks an error back to you (note the WHRE in the following example):

SELECT email, admin, credits FROM users WHRE credits > 2 + 3;
syntax error at or near "credits"

Next, the query parser checks the database metadata: does your table exist? Does the column exist? Is the operation you’re about to perform possible? Do you have authorization via the database permission system?

SELECT email, admin, credits FROM users WHERE credits > 'Ostrich';
invalid input syntax for integer: "Ostrich"

If everything’s looking good, then the query parser passes the tree along to the next step.

Query Rewriter

With our tree successfully built, it’s time to optimize it. The query rewriter takes our tree and applies a list of rules to it. For example, if you’ve put a DISTINCT on a column that already has a unique constraint, then the DISTINCT is discarded. Integer calculations are replaced with a constant (2 + 3 in our example is replaced with a 5 here). There’s a host of other rules here, but the endgame is to ensure that clock cycles incurred by query execution are kept to a minimum.

The end result is a query plan, and it is, theoretically, a heavily optimized and efficient SQL query. But how does the DB engine know which are the most efficient joins and so forth? For that, we need to take a quick detour …


A good database engine will collect statistics. These will be things like the number of records in a table, how many distinct values are in a column, minimum/maximum values, etc. This is for the query optimization process, and it’s the reason your queries are quick. For example, if you’re trying to join a table with a million rows to one with a hundred, it’s going to be drastically quicker to join on the one with fewer records. It actually goes into much more depth than this, computing histograms containing information like quantiles and standard deviation, but you get the idea.

There’s nothing less conducive to quick queries than erroneous statistics. If the query optimizer thinks there are 200 records when there are actually 200 million, and has computed histograms and statistics on this basis, this can be the difference between minutes and hours for query execution. For this reason, it’s important to trigger statistical collection periodically.

Query Executor

Finally, it’s time to execute our query! The query manager checks it has sufficient resources (memory and CPU), and if permitted, begins execution. First it will check an in-memory cache of data (memory is much quicker than disk, remember) and trigger the cache manager to prefetch the next chunk of data required, as indicated by the query plan. It will discard data that’s no longer required, and you can check if it’s doing a good job or not by looking at something called the buffer/cache-hit ratio. Around 99% is what you’re aiming for.

If this were a write/update/delete operation, your Transaction Manager would kick in here to protect your data integrity, but since we’re just doing a simple read, all that’s left is for the database’s client manager to return a beautifully paged response to the client.

Wrapping Up

There are many great resources for better understanding databases, and there’s a lot I haven’t covered here. But this should hopefully provide enough of an overview to help you begin to know what to look for when profiling. At the very least, you might think twice before blindly adding an index!

Get the latest in Ruby, once a week, for free.