Create a Neo4j Graph Database Using the REST API

Although the roots of Graph Theory can be traced back nearly 300 years, databases built on Graph Theory are hardly into their second decade. These databases are fairly young, yet they’ve found widespread implementation in some of the biggest online companies around, including Google, Facebook, and eBay. Additionally, big industries like healthcare, retail, gaming and energy are taking advantage of graph databases.

Why the popularity? Graph databases are built around relationships between entities, and this paradigm fits well with the way human brains think about things, especially social interactions. For example, Facebook “Friends” emphasizes relationships between people.

What’s wrong with traditional relational databases? Ironically, relational databases exhibit poor performance when querying for relational information. The poor performance is due to expensive join operations, which are used to define relations. A join operation causes the relational database to search through large volumes of information, only to discard all but the related entries. For example, a search through 4 degrees of separation (e.g.; friends of friends of friends of friends) would all but render a relational database useless, because the required CPU cycles and memory increases exponentially with each degree.

By contrast, graph databases barely break a sweat when searching through four degrees of separation. In fact, the pacing item is not necessarily the search itself, but it is the volume of results, which could easily yield millions of entries.

In this two-part article, we’ll work with the most mature and popular graph database, known as Neo4j. Neo4j is written in Java, but has interfaces for many languages, including Ruby. In the spirit of flexibility, we’ll eschew the Ruby interface and instead take advantage of the REST API interface, which allows us to work with Neo4j using nothing more than the rest_client and json gems. The code in this article can therefore be refactored into other languages, like PHP, Python and perhaps even, JavaScript without much effort.

To make this journey, this article will focus on setting up Neo4j to implement a simple CRM (Customer Relationship Management) for a sales department. We’ll briefly look at the Cypher query language that is used to communicate with Neo4j, and then we’ll put information into the database using the API.

Background: Sticks and Bubbles

Look inside any engineering conference room and you’re likely to see stick-and-bubble diagrams on the white board. More specifically, you’ll see circles drawn around concepts, with arrows pointing to other circles. These types of diagrams are very effective, because human brains perceive the world as interconnected entities. Each entity, represented by a circle, contains any number of attributes, like names, dates, computer IP addresses, etc. The arrows that interconnect the circles represent relationships, such as the flow of money, data, IP datagrams, etc.

In our CRM example, we’ll show a relationship between an account territory manager and a sales account manager. The interface appears in the simple drawing below:

Graph of the relationship Linda manages Jeff

Notice how each node (circle) has several attributes, such as label, name, and title. Strictly speaking, the label attribute is the only necessary attribute, as it describes the nature of the node. The relationship is simply described as :Manages, and in this case, its intent is obvious: Linda manages Jeff.

We use a form of “ASCII-Art” to represent this relationship as follows:

(Linda)-[manages]->(Jeff)

In this case, encasing Linda and Jeff in parenthesis roughly represents circular nodes, and their relationship is represented with an arrow: --> to indicate that this is a directed graph. The relationship is further described by the word manages.

Neo4j Cypher Query Language

The Neo4j Cypher query language implements a protocol that closely resembles the above ASCII-Art. For example, we create the Linda and Jeff nodes by using the keyword CREATE, followed by the definition of the nodes.

CREATE (:Person {name:"Linda Barnes", title:"Territory Manager"} );
CREATE (:Person {name:"Jeff Dudley", title:"Account Manager"} );

The node label is identified using any word of your choice, preceded with a colon. In this case, we specify both Linda and Jeff as persons, and denote that designation using the label :Person. Instead of :Person, we could have used :Human, or :individual, or :Dude or anything you feel appropriate. The rule for selecting label names is to pick terms with well-understood meanings, and use them consistently throughout.

Notice that the node attributes name and title are created in a manner that is highly similar to a symbolic-key hash in Ruby. There is no practical limit to the attributes you place within a node. Also, you don’t have to specify all the attributes when creating the nodes. You can modify a node anytime to provide further clarity and meaning. This is far more flexible than relational databases like MySQL or PostgreSQL. Relational databases force you to change the table design, which affects all rows within the table. In graph databases, you simply make a change to one or more nodes, and it affects only the nodes you’re changing.

Now it’s time to tie these two nodes together into a directed graph relationship that implies the fact that Linda manages Jeff. This is accomplished in a two-step process: 1) create a reference to each node, and 2) define the relationship. We use the MATCH Cypher keyword to create the reference.

MATCH ( a:Person {name:"Linda Barnes"} ), ( b:Person {name:"Jeff Dudley"} )
CREATE (a)-[:Manages]->(b);

Notice how the unique references to each node is established by identifying the :Person label and the name attribute. The references, a and b are subsequently used in the CREATE command to establish a :Manages relationship between the two nodes. The direction is conveyed using the > symbol. (Note that an undirected graph can be created by leaving off the > symbol; however, in the sense of hierarchical management, it would be meaningless to say that Linda and Jeff manage each other, thus the directed graph is used.)

Note that, similar to nodes, relationships can also contain attributes, although in this particular case, we chose not to include any attributes within the :Manages relationship.

Running Neo4j with Ruby

Let’s see how this would look using the Ruby rest_client API.

Before we begin, however, we’ll need to get access to a Neo4j database. For learning purposes, we’ll install it on our local machine in accordance with the instructions on the Neo4j website.

Go to the Neo4j site, and click on the download page. The correct software for your machine should download automatically after opening this page. While the package downloads, you can read the step-by-step instructions to install it on your machine. If you don’t already have the Oracle JDK (Java Development Kit) installed, you’ll have to locate it and install it before installing Neo4j. Fortunately, the Neo4j download page contains a link to the JDK download page, so it’s a simple matter to locate, download, and install the JDK.

We won’t go into the installation details here, as this process is documented on the Neo4j download page. Just follow the installation instructions. It’s surprisingly easy!

After everything is downloaded and installed, cd to the Neo4j extracted folder and start the server with the command:

bin/neo4j start

Connect to the Neo4j server by opening a browser to port 7474 on your local host and see the welcome screen, as shown below.

Neo4j Opening Page

This page turns out to be highly useful, as it can display your current database as the classic “stick and bubble” diagram. It also allows you to test out your Cypher commands manually and see instant results.

To prepare your Ruby code, install the gems for rest_client and json, if they’re not already installed on your system:

gem install rest-client
gem install json

We’re finally at a point where we can execute some Ruby code and talk to the Neo4j server!

We’ll begin by creating a class, RGraph. We’ll build on this class over the course of this article.

require 'json'
require 'rest_client'

class RGraph

  def initialize
    @url = 'https://localhost:7474/db/data/cypher'
  end

end

Notice the URL address, /db/data/cypher. This is the base address for all your API calls that will use the Cypher query language.

And now let’s define a method to create a Neo4j graph database node.

def create_node (label,attr={})
  # Create a node
  query = ''  # holds the final query string
  attributes = '' # holds the attribute portion of the query, if any
  if attr.size == 0
    # No attributes, so create a simple node
    query += "CREATE (:#{label});"
  else
    # Create the attribute clause portion of the query
    attributes += '{ '
    attr.each do |key,value|
      attributes += "#{key.to_s}: '#{value}',"
    end
    attributes.chomp!(',') # Neo4j hates extra commas!
    attributes += ' }'
    query += "CREATE (:#{label} " + attributes + ');'
  end
  c = {
      "query" => "#{query}",
      "params" => {}
  }
  RestClient.post @url, c.to_json, :content_type => :json, :accept => :json
end

Note the label argument in line 1, which is a required part of the node. Attributes are optional; however, it would be highly unusual to create a node without attributes. The attributes carry the information for that node. In our simple CRM example, the attributes carry the person’s name and title. It could also contain other information, such as birth date, salary, favorite color, or anything that helps identify this unique individual.

Note on line 18 the creation of a temporary hash that contains the actual Cypher query as well as other optional parameters. We could define the content of the attributes within the params element rather than including the attributes directly in the command line. This could be useful in situations where you have some complicated parameters and would like to break it out into a different line. In our simple case, however, we include all parameters within the query itself and pass in an empty params element.

The actual REST call is made on line 22.

That was fairly simple, wasn’t it? You may suspect that the creation of a relationship is equally simple. Well, this is almost the case. Remember that a relationship connects two nodes. Consequently, we’ll have to use a Cypher MATCH statement to locate the two nodes and reference those nodes in a subsequent CREATE statement.

def create_directed_relationship (from_node, to_node, rel_type)
  # Create a directed relationship between nodes
  query = ''  # Holds the final query string
  attributes = '' # holds identifying attributes, if any
  # First put together the two matching statements to find the
  # source and destination nodes
  query += "MATCH ( a:#{from_node[:type]} "
  from_node.each do |key,value|
    next if key == :type # Don't count "type" as an attribute
    attributes += "#{key.to_s}: '#{value}',"
  end
  attributes.chomp!(',') # Get rid of extra comma
  query += "{ #{attributes} }),"
  attributes = '' # Reset to process next MATCH statement
  query += " ( b:#{to_node[:type]} "
  to_node.each do |key,value|
    next if key == :type # Don't count "type" as an attribute
    attributes += "#{key.to_s}: '#{value}',"
  end
  attributes.chomp!(',') # Get rid of extra comma
  query += "{ #{attributes} }) "
  # The "a" and "b" nodes are now identified, so now create the relationship
  query += "CREATE (a)-[:#{rel_type}]->(b);"
  c = {
      "query" => "#{query}",
      "params" => {}
  }
  RestClient.post @url, c.to_json, :content_type => :json, :accept => :json
end

The from_node and to_node arguments in line one are actually hashes that uniquely describe the two nodes being connected. In this case, we adopted a simple (perhaps too simple for production release!) rule for the input node specifications, requiring them to contain a key->value pair that includes the :type key to specify the node label. Any extra keys are used to help further identify the unique node. The results of all these attributes are folded into Cypher MATCH statements and then added to the query string.

Line 23 folds in the CREATE command, specifying the last argument, rel_type, as the label for the relationship.

Line 28 makes the REST call, causing the relationship to be stored in the database.

The CRM Database

The diagram below shows where we’re heading with the above code during the next installment of this article. We will fill out the remaining methods within the RGraph class and then use these methods to fill out all the nodes and relationships for a very simple sales CRM.

Sales CRM Graph

The structure of this CRM will allow territory managers (Linda, in this case) to make sophisticated inquiries into the data. For example, Linda could ask the question, “Of all the companies in my territory, who are all the customer managers that we have NOT yet contacted, and who are the associated responsible account managers?”

Summary

In this first of two articles, we briefly looked at the Neo4j database and constructed some simple queries using the Cypher query language. We then created a Ruby class to leverage our knowledge of Cypher to run REST API calls. In the next article, we will follow through with a simple CRM implementation to show how a sales organization might represent their customer relationships in Neo4j, allowing the territory manager to make complex queries and gain valuable insight into customer accounts.

Frequently Asked Questions (FAQs) about Creating a Neo4j Graph Database Using REST API

What is the Neo4j Graph Database and why should I use it?

Neo4j is a highly scalable, native graph database that is designed to leverage data relationships as first-class entities. It’s an open-source, NoSQL, ACID-compliant database that provides high performance and zero downtime. The main reason to use Neo4j is its excellent capacity to handle data relationships. Unlike traditional relational databases, where data is stored in rows and columns, Neo4j stores data in nodes and relationships, which allows for faster and more complex queries.

How does the REST API work with the Neo4j Graph Database?

The REST API provides a way to interact with the Neo4j database over HTTP. It allows you to create, read, update, and delete data in your Neo4j database using HTTP methods like GET, POST, PUT, and DELETE. This makes it possible to interact with your Neo4j database from any programming language that can send HTTP requests.

What are the prerequisites for creating a Neo4j Graph Database using REST API?

Before you can create a Neo4j Graph Database using the REST API, you need to have Neo4j installed and running on your system. You also need to have a basic understanding of HTTP methods and JSON, as these are used in the API requests and responses.

How can I create a node in Neo4j using the REST API?

To create a node in Neo4j using the REST API, you need to send a POST request to the /db/data/node endpoint. The body of the request should be empty, as nodes in Neo4j do not require any properties to be created.

How can I add properties to a node in Neo4j using the REST API?

To add properties to a node in Neo4j using the REST API, you need to send a PUT request to the /db/data/node/{nodeId}/properties/{propertyName} endpoint. The body of the request should contain a JSON object with the property value.

How can I create a relationship between nodes in Neo4j using the REST API?

To create a relationship between nodes in Neo4j using the REST API, you need to send a POST request to the /db/data/node/{nodeId}/relationships endpoint. The body of the request should contain a JSON object with the type of the relationship and the id of the node to which the relationship is directed.

How can I update properties of a node in Neo4j using the REST API?

To update properties of a node in Neo4j using the REST API, you need to send a PUT request to the /db/data/node/{nodeId}/properties endpoint. The body of the request should contain a JSON object with the new property values.

How can I delete a node in Neo4j using the REST API?

To delete a node in Neo4j using the REST API, you need to send a DELETE request to the /db/data/node/{nodeId} endpoint. Note that you can only delete a node if it has no relationships.

How can I delete a relationship in Neo4j using the REST API?

To delete a relationship in Neo4j using the REST API, you need to send a DELETE request to the /db/data/relationship/{relationshipId} endpoint.

How can I handle errors when using the Neo4j REST API?

When using the Neo4j REST API, errors are returned as HTTP status codes. For example, a 404 status code means that the requested resource could not be found, and a 500 status code means that there was a server error. The body of the response usually contains a JSON object with more information about the error.