A CRM with Neo4j and REST

Tweet

logo

In a previous post, we briefly looked at the Neo4j database and created some simple Ruby API methods that leveraged our knowledge of the Cypher query language. We started to put together a simple CRM (Customer Relationship Management) application in an attempt to show how easily graph databases can navigate in a highly connected world. In this article, we’ll continue building on the CRM example by zooming in on the management structure of a fictional sales organization and see how easy it is to make complex queries of the database.

Let’s Build the Database!

First, let’s build a simple database. Data for this database will be expressed within a Ruby hash structure. This hash structure could easily come from a JSON file, which, in turn, could come from a set of online web forms or an Excel spreadsheet.

This particular dataset will describe the structure of a sales organization and its relations to several client companies. In this structure:

  • There is one territory manager who manages –
  • Three account managers, each of which manages –
  • One company account. Each company account has –
  • One or more employees, most of which –
  • Are managed by a manager.
  • The database persists meetings between account managers and customers at their respective accounts.

The graph diagram below, which also appeared in part 1 of this article, describes the nodes and their relationships.

Sales CRM Graph

The Ruby hash structure that we’ll use to populate the database can be seen in the listing below. Note that, in the interest of simplicity, this is just a partial listing of the database. It contains sufficient information for you to understand where we’re heading. We include the @data variable within the initialize method of the RGraph class, introduced in previous article.

@data = {
    nodes: [
        {
            label: 'Person',
            title: 'Territory Manager',
            name: 'Linda Barnes'
        },
        {
            label: 'Person',
            title: 'Account Manager',
            name: 'Jeff Dudley',
        },
        # ...
        {
            label: 'Company',
            name: 'OurCompany, Inc.'
        },
        {
            label: 'Company',
            name: 'Acme, Inc.'
        },
        {
            label: 'Company',
            name: 'Wiley, Inc.'
        },
        {
            label: 'Company',
            name: 'Coyote, Ltd.'
        },
    ],
    relationships: [
        {
            type: 'MANAGES',
            source: 'Linda Barnes',
            destination: ['Jeff Dudley', 'Mike Wells', 'Vanessa Jones']
        },
        {
            type: 'MANAGES',
            source: 'Jesse Hoover',
            destination: ['Ralph Green', 'Patricia McDonald']
        },
        # ...
        {
            type: 'WORKS_FOR',
            destination: 'OurCompany, Inc.',
            source: ['Linda Barnes', 'Jeff Dudley', 'Mike Wells', 'Vanessa Jones']
        },
        {
            type: 'WORKS_FOR',
            destination: 'Acme, Inc.',
            source: ['Jesse Hoover', 'Ralph Green', 'Sheila Foxworthy', 'Janet Huxley-Smith',
                     'Tim Reynolds', 'Zachary Meyer', 'Milton Stacey', 'Steve Nauman', 'Patricia McDonald']
        },
        # ...
        {
            type: 'ACCOUNT_MANAGES',
            source: 'Jeff Dudley',
            destination: 'Acme, Inc.'
        },
        {
            type: 'ACCOUNT_MANAGES',
            source: 'Mike Wells',
            destination: 'Wiley, Inc.'
        },
        {
            type: 'ACCOUNT_MANAGES',
            source: 'Vanessa Jones',
            destination: 'Coyote, Ltd.'
        },
        {
            type: 'HAS_MET_WITH',
            source: 'Jeff Dudley',
            destination: ['Tim Reynolds', 'Zachary Meyer', 'Janet Huxley-Smith', 'Patricia McDonald']
        },
        {
            type: 'HAS_MET_WITH',
            source: 'Mike Wells',
            destination: ['Francine Gonzalez', 'Tsunomi Ito', 'Frank Cutler']
        },
        {
            type: 'HAS_MET_WITH',
            source: 'Vanessa Jones',
            destination: 'Tracey Stankowski'
        }
    ]

Using the RGraph class, we’ll create some code to read in all the data and construct the graph database using the create_node and create_directed_relationship methods. The code for these methods was provided in the previous article.

Because there’s a lot of data to process, we’ll create plural forms of the above methods. More specifically, we’ll create two additional methods: create_nodes and create_directed_relationships that will be used to process each node and relationship expressed in the @data hash. These plural forms will call their singular counterparts within looping structures.

def create_nodes
  # Scan file, find each node and create it in Neo4j
  @data.each do |key,value|
    if key == :nodes
      @data[key].each do |node| # Cycle through each node
        next unless node.has_key?(:label) # Make sure this node has a label
        # We have sufficient data to create a node
        label = node[:label]
        attr = Hash.new
        node.each do |k,v|  # Hunt for additional attributes
          next if k == :label # Don't create an attribute for "label"
          attr[k] = v
        end
        create_node(label,attr)
      end
    end
  end
end

A main loop is set up in line 3 of the create_nodes listing above. This loop simply looks for the node array, which provides information for each node. Once this array is discovered, an inner loop begins on line 5. It filters out any nodes that do not have a label attribute. The loop on line 10 is used to accumulate information for any node attributes. Finally, after gathering all necessary information, it calls the create_node method, which, in turn, will push all this information into the Neo4j database.

A similar set of loops is used to create the relationships between the nodes. However, relationships are a bit more complicated because they need to unambiguously identify the nodes to which they’re attached.

The relationships are created in the create_directed_relationships listing below:

def create_directed_relationships
  # Scan file, look for relationships and their respective nodes
  @data.each do |key,value|
    if key == :relationships
      @data[key].each do |relationship| # Cycle through each relationship
        next unless relationship.has_key?(:type) &&
            relationship.has_key?(:source) &&
            relationship.has_key?(:destination)
        rel_type = relationship[:type]
        case rel_type
          # Handle the different types of cases
          when 'MANAGES', 'ACCOUNT_MANAGES', 'HAS_MET_WITH'
            # in all cases, we have one :Person source and one or more destinations
            from_node = {type: 'Person', name: relationship[:source]}
            to_node = (rel_type == 'ACCOUNT_MANAGES') ? {type: 'Company'} : {type: 'Person'}
            if relationship[:destination].class == Array
              # multiple destinations
              relationship[:destination].each do |dest|
                to_node[:name] = dest
                create_directed_relationship(from_node,to_node,rel_type)
              end
            else
              to_node[:name] = relationship[:destination]
              create_directed_relationship(from_node,to_node,rel_type)
            end
          when 'WORKS_FOR'
            # one destination, one or more sources
            to_node = {type: 'Company', name: relationship[:destination]}
            from_node = {type: 'Person'}
            rel_type = 'WORKS_FOR'
            if relationship[:source].class == Array
              # multiple sources
              relationship[:source].each do |src|
                from_node[:name] = src
                create_directed_relationship(from_node,to_node,rel_type)
              end
            else
              from_node[:name] = relationship[:source]
            end
        end
      end
    end
  end
end

The array of relationships is located within the loop on line 3. Each relationship is then cycled, beginning with the loop on line 5. Note the filtering that takes place in lines 6 through 8, where we ensure that each relationship has, at a minimum, an identified type, a source node and a destination node.

The case structure on line 10 differentiates between two types of relationships: those that have one source and multiple destinations, and those that have one destination with multiple sources. In cases where there are multiple sources or destinations, loops are used to cycle through each pair, and at the center of each loop is a call to the create_directed_relationship method.

To finally populate the database with all this information, we instantiate RGraph and create the nodes and relationships as shown below. Note that we must create the nodes before we create the relationships.

rGraph = RGraph.new

rGraph.create_nodes

rGraph.create_directed_relationships

The final result of all this work is shown in the Neo4j window below.

Neo4j Database Representation

Before moving on to the final point, take stock of what we’ve done so far. We’ve designed a database and expressed the data in a Ruby hash. We then used the Ruby REST API to populate all this information into a graph database. Now, it’s time to ask the database some questions.

Query for Essential Business Information

Salespeople who work in B2B environments are very interested in understanding the hierarchical structure within their target accounts. The only way to get familiar with the hierarchy is to dig in and meet directly with the managers.

This particular database persists meetings between salespeople and individuals within their target accounts through the HAS_MET_WITH relationship. The territory manager would therefore like to ask the following question: What managers have we not met within our target accounts? More specifically, the territory manager wants a list of managers that haven’t been contacted yet, the name of the company, and the responsible account manager.

For the sake of simplicity, we’ll abbreviate the Account Manager who works at OurCompany, Inc. as am. Similarly, we’ll abbreviate each manager at the target account as tm. Finally, we’ll abbreviate each targeted account company as tc.

The Cypher query might look something like this:

MATCH (am:Person), (tm:Person), (tc:Company)
WHERE (am {title:"Account Manager"})-[:WORKS_FOR]->(:Company {name:"OurCompany, Inc."})
AND (am)-[:ACCOUNT_MANAGES]->(tc)
AND (tm)-[:WORKS_FOR]->(tc)
AND (tm)-[:MANAGES]->()
AND NOT (am)-[:HAS_MET_WITH]->(tm)
return am.name,tm.name,tc.name;

Okay, I admit I cheated here by throwing a WHERE clause at you without warning. Similar to classic SQL, the WHERE clause helps filter the results by adding a set of qualifications to the MATCH.

The WHERE clause breaks down as follows:

  • The account manager (am) is qualified by specifying that he/she is someone with the “Account Manager” title who works for OurCompany, Inc.
    (am {title:"Account Manager"})-[:WORKS_FOR]->(:Company {name:"OurCompany, Inc."})

  • The target company (tc) is identified by virtue of the fact that it is account-managed by the account manager.
    (am)-[:ACCOUNT_MANAGES]->(tc)

  • The target account manager is identified by the fact that he/she works for the targeted company (tc)
    (tm)-[:WORKS_FOR]->(tc)
    Also, the target account manager is further identified by the fact that he/she manages somebody.
    (tm)-[:MANAGES]->()
    The final parenthesis is empty in this case, because we don’t want to specify who is being managed; we just want to ensure that the targeted tm is a manager.

  • Finally, we filter on the fact that the account manager has not met with the manager at the target account. We use the keyword NOT to convey the negative relationship.
    NOT (am)-[:HAS_MET_WITH]->(tm)

We write a quick method within Ruby that executes this query and returns the results as a JSON structure:

def find_managers_not_met
  query =  'MATCH (am:Person), (tm:Person), (tc:Company)'
  query += 'WHERE (am {title:"Account Manager"})-[:WORKS_FOR]->(:Company {name:"OurCompany, Inc."})'
  query += 'AND (am)-[:ACCOUNT_MANAGES]->(tc)'
  query += 'AND (tm)-[:WORKS_FOR]->(tc)'
  query += 'AND (tm)-[:MANAGES]->()'
  query += 'AND NOT (am)-[:HAS_MET_WITH]->(tm)'
  query += 'return am.name,tm.name,tc.name;'
  c = {
      "query" => "#{query}",
      "params" => {}
  }
  response = RestClient.post @url, c.to_json, :content_type => :json, :accept => :json
  puts JSON.parse(response)
end

The final results appear as a Ruby hash as shown below:

{
  "columns"=>["am.name", "tm.name", "tc.name"],
  "data"=>[
    ["Jeff Dudley", "Jesse Hoover", "Acme, Inc."],
    ["Jeff Dudley", "Ralph Green", "Acme, Inc."],
    ["Mike Wells", "Mary Galloway", "Wiley, Inc."],
    ["Vanessa Jones", "George Quincy", "Coyote, Ltd."]
  ]
}

You can interpret the first data array as, “Jeff Dudley has not yet met with Jesse Hoover, who is a manager at Acme, Inc.”

Summary

Graph databases are optimized to work with relationships between nodes. This objective immensely speeds up database queries when the critical information is relationship-based. Neo4j is one of the most mature and stable graph databases on the market. Although Neo4j provides libraries that port their database to every major language, including Ruby, it’s also possible – and fairly easy – to simply use the REST API instead of dealing with libraries. This necessitates becoming familiar with Neo4j’s Cypher query language. Fortunately, Cypher is fairly easy to learn, and it’s simple to place these queries directly in the REST API calls.

Free JavaScript: Novice to Ninja Sample

Get a free 32-page chapter of JavaScript: Novice to Ninja and receive updates on exclusive offers from SitePoint.

  • sebastiaan hilbers

    How does neoj scale? Does it replicate? I’m looking at it from a mongodb perspective.