LevelDB in Ruby
Typically when you set out with a Rails application, your data lives inside a MySQL (or PostgreSQL, or SQLite, or Oracle) database. But, it turns out that the traditional relational database is not a great fit for all types of data. For example, if you want very fast access to data that isn’t flushed to disk all that often, you could hold that information in RAM. In this case, you might want Redis or Memcached. If your data can be represented as a graph, you’ll want to check out Neo4j or OrientDB. But sometimes, you don’t want or need a full-blown database server; instead, you can make do with a simple library that can be packaged along with your app. This is where LevelDB fits in: it is a key-value storage library that’s meant to be fast.
In this article, we’ll cover how to use LevelDB with a Ruby wrapper and also discuss where you should and shouldn’t use LevelDB by presenting common use cases.
Why Not a Hash?
LevelDB is a key-value storage library, meaning you can associate string keys and string values which can be queried later. Wait a second – isn’t that like a Ruby hash? What’s the point of adding on another dependency if Ruby already has what we want? It turns out that Ruby’s hash can’t really be used as a on-disk key-value store.
First of all, the LevelDB “hash” is stored in a database file instead of being held in memory. So, when your app ends/crashes, you will still be able to access LevelDB data when the app restarts. In addition, LevelDB includes tools to deal with problems that could and probably will occur. First of all, there’s all sorts of things that can go wrong when you try to write to a database. LevelDB lets you know if something goes wrong.
It also lets us apply updates to the key-value store atomically. Basically, with an atomic update, either the whole update completes or none of it does (we’ll see an example shortly). Another great feature is fully synchronous writes, meaning updates to the key-value store don’t return until it has actually written them to the underlying device (e.g. a hard drive). We can construct a system that can’t have lost more than the updates it was working on at the instant that a crash occurs.
LevelDB gives us some automatic synchronization. If we have two threads accessing the same database, for some operations, LevelDB will make sure that we don’t run into a problem by trying to access or modify the database at the same time. Finally, there’s also all sorts of performance benefits that LevelDB gives us.
Taken as a whole, it is obvious that there’s a lot more to key-value stores than just taking a hash and serializing it. We always have to think about failure and, most of the time, concurrency. LevelDB takes all this complexity and lets us think about the key-value store as a simple hash and that’s what makes it awesome.
LevelDB-Ruby Basics
LevelDB’s “default” API is in C++, but thankfully, someone’s written us Rubyists a binding! Before we can install it, we need a copy of the native library. The installation depends on what platform you’re on (or you can just compile from source). If you’re on Mac OS X with Homebrew, you can run:
brew install leveldb
If you are on a Debian-derivative system (e.g. Ubuntu):
sudo apt-get install leveldb
Now we can install the gem:
gem install leveldb-ruby
Unfortunately, the gem is a bit old and lacking a few features, but it is pretty useful nonetheless. Let’s jump right in with a simple example:
require 'leveldb'
db = LevelDB::DB.new("my-database.db")
db.put "dhaivat", "pandya"
First, create a LevelDB database (which is actually created as a directory with a bunch of files inside it) and associate the key “dhaivat” with the value “pandya”. We can get values out of the “hash” just as easily:
db.get "dhaivat"
The binding makes our life even easier by allowing us to use the standard hash syntax on the LevelDB database:
db["dhaivat"] = "pandya"
p db["dhaivat"]
We also get a pretty useful utility method called contains?
that tells us whether or not the database contains a given key:
db.includes?("dhaivat") => true
We can get the keys and values just as easily:
db.keys
db.values
Iteration
Underneath the hood, LevelDB can be approximated as a very efficient implementation of a data structure called a “B+Tree”. This means that the Hash that LevelDB represents is an ordered one. In other words, each key-value pair is sorted according to a specific rule, whereas the Ruby Hash provides no such guarantee of ordering. By default, LevelDB orders the pairs by alphabetically ordering the keys, so we can iterate over them in that order:
db.each do |key, value|
puts "#{key}, #{value}"
end
We can even map over the database just as we would for a regular Ruby hash:
db.map do |key, value|
[key, " #{value} "]
end
Atomicity
So far, we’ve just been stringing together some pretty simple operations without thinking much about what would happen if one of them were to fail. Let’s take a look at this scenario:
db.put "fred", "smith"
db.put "john", db["fred"]
db.delete "fred"
What if we somehow fail just before deleting “fred”? That would mean that the value “smith” is now associated with both “fred” and “john”. In many cases, this sort of intermediary case can ruin your business logic. Instead, we need a way to make sure that either all three operations complete or none of them do. We need atomicity.
LevelDB provides the concept of a “WriteBatch” to do this. Take all of the operations and stuff them into into a WriteBatch. Generally, they are executed as an atomic operation. Here’s an example:
db.batch do |b|
b.put "fred", "smith"
b.put "john", b['fred']
b.delete "fred"
end
This ensures that all three of our operations will run as a transaction. Either all operations complete or none of them do. But, the call, by default, is asynchronous, meaning that the database has not necessarily synced to the file when the call returns. Fortunately, there’s a simple way to make it a synchronous transaction:
db.batch, :sync => true do |b|
b.put "fred", "smith"
b.put "john", b['fred']
b.delete "fred"
end
Use Cases
Alright, so we now have a basic handle on LevelDB. It almost seems too easy, but that’s where LevelDB shines. It hides all of the complicated optimizations, algorithms, and the like to make it seem as if you’re just accessing a weird, little Ruby Hash.
But, when should you use it?
The first thing to understand very clearly is that LevelDB is not a database server. There is no server involved. It is just a library. This gives it a pretty tremendous advantage: you don’t need to know about the deployment target’s setup in order to handle data on the disk in an efficient manner. So, if you’re writing an application that runs in an environment that may not already have Postgres or MySQL ready to go, LevelDB is definitely something to consider. For example, LevelDB is often used on the client side with Javascript because there’s no good equivalent of a relational database available.
One might think that a good reason not to use LevelDB would be if you have anything more complicated than a simple key-value relationship. Fortunately, this is not the case. With a little bit of thought, it is possible to use the key-value store in order to create much more complicated relationships within your data. Another major benefit of LevelDB is that it is pretty low level, so you know exactly how much time (at least, asymptotically) queries and actions will take. Thus, you can create your own abstractions on top of LevelDB that provide specific trade-offs.
Wrapping It Up
I hope you’ve enjoyed this tour of LevelDB through Ruby. If you have any questions, drop them in the comments below!