Using a Graph Database with Ruby. Part I: Introduction
Disclaimer: There are several Graph database projects available, including open-source and commercial projects, projects that aim to be APIs to Graph databases, and projects that are specialized in Distributed Graph Processing. For the articles in the series “Using a Graph Database with Ruby”, I’m going to be discussing and using Neo4j exclusively.
Before we delve into the benefits, differences and intricacies of Neo4j and how it can be used with Ruby, we first need to understand what a Graph Database is, its advantages over traditional databases, and what are some of the scenarios in which we can successfully use a Graph database.
What is a Graph Database?
A Graph Database is not a database to store graphics or images as its name may suggest. It’s a database that uses graph structures such as nodes, properties and edges to represent and store data. In addition, it allows you to represent any kind of data without the limitation of regular databases.
To better understand, let’s talk about each of these structures individually and what they represent:
Node: A node can be used to represent any type of entity that you can think of, be it a business, a blog post, a location, an oil rig, a city, etc. Graph databases don’t care what type of data they’re representing.
Properties: Properties, sometimes called attributes, are named values that relate to nodes. For example, if we take into consideration our City representation of a node, one of the properties would be “name”, another would be “population”, and so on.
Edges: Edges, sometimes called Relationships, connect nodes-to-nodes and organize them into arbitrary structures such as a Map, List or a Tree. It’s important to note that when a node is the start node of a relationship, the relationship from that node’s perspective will be an outgoing relationship. And when a node is at the end of a relationship, the relationship from that node’s perspective will be an incoming relationship. Understanding this will make it easier for you to follow the examples.
What can a Graph Database be used for?
There are many scenarios for which one could consider using a Graph Database. Before I list a few examples, as with any architectural and technical decision, you need to analyze all possible solutions. Then, you can select those that are best for you, according to your specifications.
Some of these scenarios include: social networking, fraud detection, people/movie/music recommendation, manufacturing, etc. Using the social networking example, it would be somewhat trivial to do something like “given the fact that Bob is my friend, give me all friends that are friend’s of friend’s of friend’s of Bob”. This is possible because of the path-finding algorithms involved are easy to implement by traversing through the graph. Imagine doing that through a relational database? A nightmare!
Another advantage when using a Graph database is that you can easily and more naturally model a domain using a whiteboard or a piece of paper. Specifically, nouns that are used become nodes, verbs become relationships, and adjectives and adverbs become properties.
Graph Database Example
Let’s examine the following scenario and how we would structure it using a Graph Database: John is friends with Bob, and Bob is friends with Mark. Visually, this is how we could represent the scenario:
You notice that the visual representation is pretty much how we verbally expressed our scenario. This is a very simple example that we can use to compare the implementation of a traditional SQL database structure versus a Graph database structure:
In a traditional SQL structure, each row in the users table represents a user, and each row in the friends table represents a relationship between two users. If we decide to add additional properties to the users table at a later time, we would have to alter the base structure of that table. And if we wanted to add new properties to only a subset of users we would still have to alter the entire users table, or create a new table to accommodate the new values just for the subset of users. Not an ideal scenario when dealing with tens of millions of records.
On the other hand, a Graph database has no set structure or schema for the data, much like a NoSQL database. Now, let’s understand what the graph above is representing: each node represents an entity – a User in our case; each node contains property values, in this case it’s the User’s name; and each line represents a relationship between the nodes. This is as simple as it gets. And to complement the scenario described in the traditional SQL example, if we did decide to add additional properties to a subset of users, we could easily perform this action on a per-node level instead of a table-wide transaction.
What is Neo4j
Neo4j is “The World’s Leading Graph Database” according to Neo Technology, the developer behind the project. It’s a commercially supported open-source graph database implemented in Java and some of the key characteristics include:
- Data representation using an intuitive graph-oriented model.
- Binding for a number of languages, including Ruby, Python and Closure.
- Disk-based, native storage manager completely optimized for storing graph structures for maximum performance and scalability.
- Massive scalability, it can handle graphs of several billion nodes/relationships/properties on a single machine.
- Easy to use and convenient object-oriented API.
- Handles large graphs that don’t fit in memory with durability and a fully persistent transactional store.
- Powerful traversal framework for high-speed traversals in the node space.
- Optimized for highly connected data.
- Small footprint, only about 750k jar file.
- It provides a REST interface for languages not supported by its bindings.
- It’s capable of traversing depths of over 1000 levels at millisecond speeds.
- It provides a dual license: open source and commercial.
- It integrates seamlessly with Lucene, providing full-text search to nodes and relationships, including phrase queries, wildcard queries, proximity queries, ranked searching, sorting, and more.
Using Neo4j with Ruby
There are different ways of integrating Neo4j with a Ruby (or Rails) application and before I list them here, below are a few of the combined features from the available solutions:
- Object Oriented Mapping
- “Drop-in” replacement for ActiveModel
- Embedded database
- Thin Ruby wrapper to the Neo4j REST Api
- Indexing for full-text search
- Chainable methods, e.g: method(x).method(y).method(z)…
- When using with Rails, the syntax is very similar to ActiveRecord
Neat, isn’t it? And without much further ado, here are the gems that we’ll be using an evaluating in the next article:
- Neo4j.rb: A graph database for JRuby
- Neoid: Uses Neo4j.rb to power its searchable objects
- Neography: Rest API for the Neo4j server
If you’re curious, I encourage you to look into those projects to have a better understanding of what they do and how complex or simple they are. This way you’ll have a better chance to comprehend the examples in the next article.
The Part II of this series will cover the different ways that Neo4j can be integrated with Ruby by using the aforementioned methods. Trust me, you don’t want to miss it.