Create a Neo4j Graph Database Using the REST API

logo

Although the roots of Graph Theory can be traced back nearly 300 years, databases built on Graph Theory are hardly into their second decade. These databases are fairly young, yet they’ve found widespread implementation in some of the biggest online companies around, including Google, Facebook, and eBay. Additionally, big industries like healthcare, retail, gaming and energy are taking advantage of graph databases.

Why the popularity? Graph databases are built around relationships between entities, and this paradigm fits well with the way human brains think about things, especially social interactions. For example, Facebook “Friends” emphasizes relationships between people.

What’s wrong with traditional relational databases? Ironically, relational databases exhibit poor performance when querying for relational information. The poor performance is due to expensive join operations, which are used to define relations. A join operation causes the relational database to search through large volumes of information, only to discard all but the related entries. For example, a search through 4 degrees of separation (e.g.; friends of friends of friends of friends) would all but render a relational database useless, because the required CPU cycles and memory increases exponentially with each degree.

By contrast, graph databases barely break a sweat when searching through four degrees of separation. In fact, the pacing item is not necessarily the search itself, but it is the volume of results, which could easily yield millions of entries.

In this two-part article, we’ll work with the most mature and popular graph database, known as Neo4j. Neo4j is written in Java, but has interfaces for many languages, including Ruby. In the spirit of flexibility, we’ll eschew the Ruby interface and instead take advantage of the REST API interface, which allows us to work with Neo4j using nothing more than the rest_client and json gems. The code in this article can therefore be refactored into other languages, like PHP, Python and perhaps even, JavaScript without much effort.

To make this journey, this article will focus on setting up Neo4j to implement a simple CRM (Customer Relationship Management) for a sales department. We’ll briefly look at the Cypher query language that is used to communicate with Neo4j, and then we’ll put information into the database using the API.

Background: Sticks and Bubbles

Look inside any engineering conference room and you’re likely to see stick-and-bubble diagrams on the white board. More specifically, you’ll see circles drawn around concepts, with arrows pointing to other circles. These types of diagrams are very effective, because human brains perceive the world as interconnected entities. Each entity, represented by a circle, contains any number of attributes, like names, dates, computer IP addresses, etc. The arrows that interconnect the circles represent relationships, such as the flow of money, data, IP datagrams, etc.

In our CRM example, we’ll show a relationship between an account territory manager and a sales account manager. The interface appears in the simple drawing below:

Graph of the relationship Linda manages Jeff

Notice how each node (circle) has several attributes, such as label, name, and title. Strictly speaking, the label attribute is the only necessary attribute, as it describes the nature of the node. The relationship is simply described as :Manages, and in this case, its intent is obvious: Linda manages Jeff.

We use a form of “ASCII-Art” to represent this relationship as follows:

(Linda)-[manages]->(Jeff)

In this case, encasing Linda and Jeff in parenthesis roughly represents circular nodes, and their relationship is represented with an arrow: --> to indicate that this is a directed graph. The relationship is further described by the word manages.

Neo4j Cypher Query Language

The Neo4j Cypher query language implements a protocol that closely resembles the above ASCII-Art. For example, we create the Linda and Jeff nodes by using the keyword CREATE, followed by the definition of the nodes.

CREATE (:Person {name:"Linda Barnes", title:"Territory Manager"} );
CREATE (:Person {name:"Jeff Dudley", title:"Account Manager"} );

The node label is identified using any word of your choice, preceded with a colon. In this case, we specify both Linda and Jeff as persons, and denote that designation using the label :Person. Instead of :Person, we could have used :Human, or :individual, or :Dude or anything you feel appropriate. The rule for selecting label names is to pick terms with well-understood meanings, and use them consistently throughout.

Notice that the node attributes name and title are created in a manner that is highly similar to a symbolic-key hash in Ruby. There is no practical limit to the attributes you place within a node. Also, you don’t have to specify all the attributes when creating the nodes. You can modify a node anytime to provide further clarity and meaning. This is far more flexible than relational databases like MySQL or PostgreSQL. Relational databases force you to change the table design, which affects all rows within the table. In graph databases, you simply make a change to one or more nodes, and it affects only the nodes you’re changing.

Now it’s time to tie these two nodes together into a directed graph relationship that implies the fact that Linda manages Jeff. This is accomplished in a two-step process: 1) create a reference to each node, and 2) define the relationship. We use the MATCH Cypher keyword to create the reference.

MATCH ( a:Person {name:"Linda Barnes"} ), ( b:Person {name:"Jeff Dudley"} )
CREATE (a)-[:Manages]->(b);

Notice how the unique references to each node is established by identifying the :Person label and the name attribute. The references, a and b are subsequently used in the CREATE command to establish a :Manages relationship between the two nodes. The direction is conveyed using the > symbol. (Note that an undirected graph can be created by leaving off the > symbol; however, in the sense of hierarchical management, it would be meaningless to say that Linda and Jeff manage each other, thus the directed graph is used.)

Note that, similar to nodes, relationships can also contain attributes, although in this particular case, we chose not to include any attributes within the :Manages relationship.

Running Neo4j with Ruby

Let’s see how this would look using the Ruby rest_client API.

Before we begin, however, we’ll need to get access to a Neo4j database. For learning purposes, we’ll install it on our local machine in accordance with the instructions on the Neo4j website.

Go to the Neo4j site, and click on the download page. The correct software for your machine should download automatically after opening this page. While the package downloads, you can read the step-by-step instructions to install it on your machine. If you don’t already have the Oracle JDK (Java Development Kit) installed, you’ll have to locate it and install it before installing Neo4j. Fortunately, the Neo4j download page contains a link to the JDK download page, so it’s a simple matter to locate, download, and install the JDK.

We won’t go into the installation details here, as this process is documented on the Neo4j download page. Just follow the installation instructions. It’s surprisingly easy!

After everything is downloaded and installed, cd to the Neo4j extracted folder and start the server with the command:

bin/neo4j start

Connect to the Neo4j server by opening a browser to port 7474 on your local host and see the welcome screen, as shown below.

Neo4j Opening Page

This page turns out to be highly useful, as it can display your current database as the classic “stick and bubble” diagram. It also allows you to test out your Cypher commands manually and see instant results.

To prepare your Ruby code, install the gems for rest_client and json, if they’re not already installed on your system:

gem install rest-client
gem install json

We’re finally at a point where we can execute some Ruby code and talk to the Neo4j server!

We’ll begin by creating a class, RGraph. We’ll build on this class over the course of this article.

require 'json'
require 'rest_client'

class RGraph

  def initialize
    @url = 'http://localhost:7474/db/data/cypher'
  end

end

Notice the URL address, /db/data/cypher. This is the base address for all your API calls that will use the Cypher query language.

And now let’s define a method to create a Neo4j graph database node.

def create_node (label,attr={})
  # Create a node
  query = ''  # holds the final query string
  attributes = '' # holds the attribute portion of the query, if any
  if attr.size == 0
    # No attributes, so create a simple node
    query += "CREATE (:#{label});"
  else
    # Create the attribute clause portion of the query
    attributes += '{ '
    attr.each do |key,value|
      attributes += "#{key.to_s}: '#{value}',"
    end
    attributes.chomp!(',') # Neo4j hates extra commas!
    attributes += ' }'
    query += "CREATE (:#{label} " + attributes + ');'
  end
  c = {
      "query" => "#{query}",
      "params" => {}
  }
  RestClient.post @url, c.to_json, :content_type => :json, :accept => :json
end

Note the label argument in line 1, which is a required part of the node. Attributes are optional; however, it would be highly unusual to create a node without attributes. The attributes carry the information for that node. In our simple CRM example, the attributes carry the person’s name and title. It could also contain other information, such as birth date, salary, favorite color, or anything that helps identify this unique individual.

Note on line 18 the creation of a temporary hash that contains the actual Cypher query as well as other optional parameters. We could define the content of the attributes within the params element rather than including the attributes directly in the command line. This could be useful in situations where you have some complicated parameters and would like to break it out into a different line. In our simple case, however, we include all parameters within the query itself and pass in an empty params element.

The actual REST call is made on line 22.

That was fairly simple, wasn’t it? You may suspect that the creation of a relationship is equally simple. Well, this is almost the case. Remember that a relationship connects two nodes. Consequently, we’ll have to use a Cypher MATCH statement to locate the two nodes and reference those nodes in a subsequent CREATE statement.

def create_directed_relationship (from_node, to_node, rel_type)
  # Create a directed relationship between nodes
  query = ''  # Holds the final query string
  attributes = '' # holds identifying attributes, if any
  # First put together the two matching statements to find the
  # source and destination nodes
  query += "MATCH ( a:#{from_node[:type]} "
  from_node.each do |key,value|
    next if key == :type # Don't count "type" as an attribute
    attributes += "#{key.to_s}: '#{value}',"
  end
  attributes.chomp!(',') # Get rid of extra comma
  query += "{ #{attributes} }),"
  attributes = '' # Reset to process next MATCH statement
  query += " ( b:#{to_node[:type]} "
  to_node.each do |key,value|
    next if key == :type # Don't count "type" as an attribute
    attributes += "#{key.to_s}: '#{value}',"
  end
  attributes.chomp!(',') # Get rid of extra comma
  query += "{ #{attributes} }) "
  # The "a" and "b" nodes are now identified, so now create the relationship
  query += "CREATE (a)-[:#{rel_type}]->(b);"
  c = {
      "query" => "#{query}",
      "params" => {}
  }
  RestClient.post @url, c.to_json, :content_type => :json, :accept => :json
end

The from_node and to_node arguments in line one are actually hashes that uniquely describe the two nodes being connected. In this case, we adopted a simple (perhaps too simple for production release!) rule for the input node specifications, requiring them to contain a key->value pair that includes the :type key to specify the node label. Any extra keys are used to help further identify the unique node. The results of all these attributes are folded into Cypher MATCH statements and then added to the query string.

Line 23 folds in the CREATE command, specifying the last argument, rel_type, as the label for the relationship.

Line 28 makes the REST call, causing the relationship to be stored in the database.

The CRM Database

The diagram below shows where we’re heading with the above code during the next installment of this article. We will fill out the remaining methods within the RGraph class and then use these methods to fill out all the nodes and relationships for a very simple sales CRM.

Sales CRM Graph

The structure of this CRM will allow territory managers (Linda, in this case) to make sophisticated inquiries into the data. For example, Linda could ask the question, “Of all the companies in my territory, who are all the customer managers that we have NOT yet contacted, and who are the associated responsible account managers?”

Summary

In this first of two articles, we briefly looked at the Neo4j database and constructed some simple queries using the Cypher query language. We then created a Ruby class to leverage our knowledge of Cypher to run REST API calls. In the next article, we will follow through with a simple CRM implementation to show how a sales organization might represent their customer relationships in Neo4j, allowing the territory manager to make complex queries and gain valuable insight into customer accounts.

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • sebastiaan hilbers

    Awesome, just can’t figure where there might be a good spot for a relational database.. A product store is not a good fit right? But a ACL system?

    • danofsocal

      Hello Sebastiaan,

      Thanks for the feedback.

      I suppose it’s all in how you anticipate relating the products in a way that benefits the user. For example, I’m currently doing work for a startup where we use Neo4j within the product store. We’re pulling several different products together into fashion “LookBooks” and selling them as a unit. These LookBooks are also “followed” by shopping clients and may also be “tagged” within other LookBooks or by professional stylists. Though it’s still early in the design phase, I can already see how interrelated the products are becoming, and therefore the graph database – in this case – was a good choice.

      By “ACL” do you mean “Access Control List?” I have a very limited experience with ACLs (I used them mainly when configuring Cisco routers way back in a previous life), and it seemed to be a fairly straightforward list with no real need for a database. But my knowledge in this area is limited, so I can’t provide a very qualified opinion. :-)

      Again, thanks for the feedback.

      Best,
      Dan