Ruby
Article

Why You Should Use Neo4j in Your Next Ruby App

By Brian Underwood

neo4j-logo-2015 (1)

I have needed to store a lot of data in my time and I’ve used a lot of the big contenders: PostgreSQL, MySQL, SQLite, Redis, and MongoDB. While I’ve built up extensive experience with these tools, I wouldn’t say that any of them have ever made the task fun. I fell in love with Ruby because it was fun and because it let me do more powerful things by not getting in my way. While I didn’t realize it, the usual suspects of data persistence were getting in my way. But I’ve found a new love: let me tell you about Neo4j.

What is Neo4j?

Neo4j is a graph database! That means that it is optimized for managing and querying connections (relationships) between entities (nodes) as opposed to something like a relational database which uses tables.

Why is this great? Imagine a world with no foreign keys. Each entity in your database can have many relationships referring directly to other entities. If you want to explore the relationships there are no table or index scans, just a few connections to follow. This matches up well with the typical object model. It is more powerful, though, because Neo4j, while providing a lot of the database functionality that we expect, gives us tools to query for complex patterns in our data.

Introducing ActiveNode

To connect to Neo4j we’ll be using the neo4j gem. You can find instructions for connecting to Neo4j in your Rails application in the gem’s documentation. Also the app with the code shown below is available as a running Rails app in this GitHub repository (use the sitepoint Git branch). When you’ve got your database up and running use the rake load_sample_data command to populate your database.

Here is a basic example of an Asset model from an asset management Rails app:

app/models/asset.rb

class Asset
  include Neo4j::ActiveNode

  property :title

  has_many :out, :categories, type: :HAS_CATEGORY
end

Let’s break this down:

  • The neo4j gem gives us the Neo4j::ActiveNode module, which we include to make a model.
  • The class name Asset means that this model will be responsible for all nodes in Neo4j labeled Asset (labels play a similar role to table names except that a node can have many labels).
  • We have a title property to describe the individual nodes
  • We have an outgoing has_many association for categories. This association helps us find Category objects by following HAS_CATEGORY relationships in the database.

With this model we can perform a basic query to find an asset and get it’s categories:

2.2.0 :001 > asset = Asset.first
 => #<Asset uuid: "0098d2b7-a577-407a-a9f2-7ec4153cfa60", title: "ICC World Cup 2015 ">
2.2.0 :002 > asset.categories.to_a
 => [#<Category uuid: "91cd5369-605c-4aff-aad1-b51d8aa9b5f3", name: "Classification">]

Anybody familiar with ActiveRecord or Mongoid will have seen this hundreds of times. To get a bit more interesting, let’s define a Category model:

class Category
  include Neo4j::ActiveNode

  property :name

  has_many :in, :assets, origin: :categories
end

Here our association has an origin option to reference the categories association on the Asset model. We could instead specify type: :HAS_CATEGORY again if we wanted to.

Creating Recommendations

What if we wanted to get all assets that share a category with our asset?

2.2.0 :003 > asset.categories.assets.to_a
 => [#<Asset uuid: "d2ef17b5-4dbf-4a99-b814-dee2e96d4a09", title: "WineGraph">, ...]

So what just happened? ActiveNode generated a query to the database which specified a path from our asset to all other assets which share a category. The database then returned just those assets to us. Here’s the query that it used:

MATCH
  asset436, asset436-[rel1:`HAS_CATEGORY`]->(node3:`Category`),
  node3<-[rel2:`HAS_CATEGORY`]-(result_assets:`Asset`)
WHERE (ID(asset436) = {ID_asset436})
RETURN result_assets

Parameters: {ID_asset436: 436}

This is a query language called Cypher, which is Neo4j’s equivalent to SQL. Note particularly the ASCII art style of parentheses surrounding node definitions and arrows representing relationships. This Cypher query is a bit more verbose because ActiveNode generated it algorithmically. If a human were to write the query it would look something like:

MATCH source_asset-[:HAS_CATEGORY]->(:Category)<-[:HAS_CATEGORY]-(result_assets:Asset)
WHERE ID(source_asset) = {source_asset_id}
RETURN result_assets

Parameters: {source_asset_id: 436}

I find Cypher easier and more powerful than SQL, but we won’t worry too much about Cypher in this article. If you want to learn more later you can find great tutorials and a thorough refcard.

As you can see, we can use Neo4j to span across our entities. Big deal! We can also do this in SQL with a couple of JOINS. While Cypher seems cool, we’re not breaking any major ground yet. What if we wanted to use this query to make some asset recommendations based on shared categories? We’ll want to sort the assets to rank those with the most categories in common. Let’s create a method on our model:

class Asset
  ...

  Recommendation = Struct.new(:asset, :categories, :score)

  def asset_recommendations_by_category(common_links_required = 3)
    categories(:c)
      .assets(:asset)
      .order('count(c) DESC')
      .pluck('asset, collect(c), count(c)').reject do |_, _, count|
      count < common_links_required
    end.map do |other_asset, categories, count|
      Recommendation.new(other_asset, categories, count)
    end
  end
end

There are a few interesting things to note here:

  • We are defining variables as part of our chain to use later (c and asset).
  • We are using the Cypher collect function to give us a result column containing an array of the shared categories (see the table below). Also note that we are getting full objects, not just columns/properties:
asset collect(c) count(c)
#<Asset> [#<Category>] 1
#<Asset> [#<Category>, #<Category>, …] 4
#<Asset> [#<Category>, #<Category>] 2

Did you notice that there is not a GROUP BY clause? Neo4j is smart enough to realize that collect and count are aggregation functions and it groups by the non-aggregation columns in our result (in this case that’s just the asset variable).

Take that SQL!

As a last step we can make recommendations on more than just categories in common. Image that we have the following sub-graph in Neo4j:

In addition to shared categories, let’s account for how many creators and viewers assets have in common:

class Asset
  ...
  Recommendation = Struct.new(:asset, :score)

  def secret_sauce_recommendations
    query_as(:source)
      .match('source-[:HAS_CATEGORY]->(category:Category)<-[:HAS_CATEGORY]-(asset:Asset)').break
      .optional_match('source<-[:CREATED]-(creator:User)-[:CREATED]->asset').break
      .optional_match('source<-[:VIEWED]-(viewer:User)-[:VIEWED]->asset')
      .limit(5)
      .order('score DESC')
      .pluck(
        :asset,
        '(count(category) * 2) + (count(creator) * 4) + (count(viewer) * 0.1) AS score').map do |other_asset, score|
      Recommendation.new(other_asset, score)
    end
  end
end

Here we delve deeper and start forming our own query. The structure is the same but, rather than finding just one path between two assets via a shared category, we also specify two more optional paths. We could make all three paths optional, but then Neo4j would need to compare our asset with every other asset in the database. By using a match rather than an optional_match for our path through Category nodes we require that there be at least one shared category. This vastly limits our search space.

In the diagram there is one shared category, zero shared creators, and two shared viewers. This means that the score between “Ruby” and “Ruby on Rails” would be:

(1 * 2) + (0 * 4) + (2 * 0.1) = 2.2

Also note that we’re doing a calculation (and sorting) on a count aggregation of these three paths. That’s so cool to me that it makes me tingle a little to think about it…

Easy Authorization

Let’s tackle another common problem. Suppose your CEO comes by your desk and says “We’ve built a great app, but customers want to be able to control who can see their stuff. Could you build in some privacy controls?” It seems simple enough. Let’s just throw on a flag to allow for private assets:

class Asset
  ...
  property :public, default: true

  def self.visible_to(user)
    query_as(:asset)
      .match_nodes(user: user)
      .where("asset.public OR asset<-[:CREATED]-user")
      .pluck(:asset)
  end
end

With this you can display all of the assets which a user can see either because the asset is public or because the viewer owns it. No problem, but again not a big deal. In another database you could just do a query on two columns/properties. Let’s get a bit crazier!

The Product Manager comes to you and says “Hey, thanks for that, but now people want to be able to give other users direct access to their private stuff”. No problem! You can build a UI to let users add and remove VIEWABLE_BY relationships for their assets and then query them like so:

class Asset
  ...

  def self.visible_to(user)
    query_as(:asset)
      .match_nodes(user: user)
      .where("asset.public OR asset<-[:CREATED]-user OR asset-[:VIEWABLE_BY]->user")
      .pluck(:asset)
  end
end

That would have been a join table otherwise. Here you just throw in another path by which users can have access to an asset. You take a moment to appreciate Neo4j’s schemaless nature.

Satisfied with your days’ work you lean back in your chair and sip your afternoon coffee. Of course, that’s when the Social Media Customer Care Representative drops by to say “Users love the new feature, but they want to be able to create groups and assign access to groups. Can you do that? Oh, also, could you allow for an arbitrary hierarchy of groups?” You stare deeply into their eyes for a few minutes before responding: “Sure!”. Since this is starting to get complicated, let’s look at an example:

If both of the assets are private your code so far gives Matz and tenderlove access to Ruby and DHH access to the Ruby on Rails. To add group support you start by following directly assigned groups:

class Asset
  ...

  def self.visible_to(user)
    query_as(:asset)
      .match_nodes(user: user)
      .where("asset.public OR asset<-[:CREATED]-user OR asset-[:VIEWABLE_BY]->user OR asset-[:VIEWABLE_BY]->(:Group)<-[:BELONGS_TO]-user")
      .pluck('DISTINCT asset')
  end
end

That was pretty easy, since you just needed to add another path. It’s two hops, sure, but that’s old hat for us by now. Tenderlove and Yehuda will be able to see the “Ruby on Rails” asset because they are members of the “Railsists” group. Also note: now that some users have multiple paths to an asset (like Matz to Ruby via the Rubyists group and via the CREATED relationship) you need to return DISTINCT asset.

Specifying an arbitrary path through a hierarchy of groups takes you a bit more time, though. You look through the Neo4j documentation until you find something called “variable relationships” and give it a shot:

class Asset
  ...

  def self.visible_to(user)
    query_as(:asset)
      .match_nodes(user: user)
      .where("asset.public OR asset<-[:CREATED]-user OR asset-[:VIEWABLE_BY]->user OR asset-[:VIEWABLE_BY]->(:Group)<-[:HAS_SUBGROUP*0..5]-(:Group)<-[:BELONGS_TO]-user")
      .pluck('DISTINCT asset')
  end
end

Here you’ve done it! This query will find assets accessible to a group and traverse any set of zero to five HAS_SUBGROUP relationships, finally ending on a check to see if the user is in the last group. You’re the hero of the story and your company showers you with bonuses for getting the job done so quickly!

Conclusion

There are many awesome things that you can do with Neo4j (including using it’s amazing web interface to explore your data with Cypher) which I’m not able to cover. Not only is it a great way to store your data in an easy and intuitive way, it provides a lot of benefits for efficient querying of highly connected data (and believe me your data is highly connected, even if you don’t realize it). I encourage you to check out Neo4j and give it a try for your next project!

More:
  • Tyler S

    In your experience, do many Rails apps use Neo4j as well as postgresql or use Neo4j as a replacement for postgres?

    • http://www.brian-underwood.codes/ Brian Underwood

      That’s a good question and I wish I had a good answer ;) I know I get asked a lot about if it’s possible to use both Neo4j and ActiveRecord in the same app (it is), so I think a lot of people either use Neo4j for just parts of their app or are transitioning. I do know of a few Rails apps which are all Neo4j

  • Nathan Shane

    Hey Brian – Thanks for the in depth introduction to Neo4j—it looks fascinating and I hope to try it out soon.

    I’m left with one main question after reading your article:

    When you start a new project, how do you choose between using Neo4j or a traditional table-based database technology like mysql? Are there certain kinds of data which you would definitely structure with Neo4j, and other kinds of data you would definitely structure with tables?

    • http://www.brian-underwood.codes/ Brian Underwood

      My typical answer is that something like a database where you’re doing logging of lots of repetitive data is usually a better fit for on RDMS (or even Mongo since you’re wouldn’t generally have foreign keys there). For a graph database obviously things that are already graphy are on the other side of the spectrum (e.g. social networks or hierarchical structures)

      Another typical answer I’ve seen is that you’d want Neo4j for when your data has a lot of relationships. I’ve found, though, that relationships start coming from places that you don’t expect when you have the ability to create them easily ;)

  • •● avnerner ●•

    Neo4j requires a license (http://neo4j.com/subscriptions/), and last I checked, it’s in the region of Many Ks of $, be sure to check this. If you plan to have even a very basic a cluster setup.

    • http://brandon.bayer.ws Brandon Bayer

      The Startup license is completely free if you have under 20 employees or under $3M in annual revenue. If you are beyond that, you can still get a substantial discount until you have over 50 employees.

      • •● avnerner ●•

        Just to clarify, I have no issue with Neo4j or their work, just want to help people save time that we have spent earlier on. We are mature startup (150~ employees, round D or so), we use MongoDB, where we only pay for MMS, we use Redis, where we only pay for RedisLab tier, we use postgres and mysql free or charge (paying for consultancy where relevant). We hoped to see something similar in Neo4J, however, after spending some time testing, to actually get a basic production grade setup, we looked at >100K, comparing to other alternatives, this was not relevant for us. It could very well be that the issue is wrong expectations on our end.

        • https://redislabs.com Itamar Haber

          Cool ;)

  • http://SalaryNet30.com Elida Flores

    Do you want to know something really interesting that is worth paying your attention right now,a fabulous online opportunity to work for those people who want to use their free time so that they can make some extra money using their computers… I have been working on this for last two and half years and I am making 60-90 dollar/ hour … In the past week I have earned 13,70 dollars for almost 20 hours sitting ….

    Any skills, Degree ,Specific qualification is not necessary for this, just keyboard typing and a good working and reliable internet connection ….

    Any time limitations to start work is not required … You may do this work at any time when you willing to do it ….

    Do you want to know how I have been doing this?…..….see this {Iink} on my !|profile|!` to know how I am working` on this`

    vbcvbnvcbdfGFDGFGV

  • Josh

    @disqus_8S0EvcIZtV:disqus – Is there anything you would change about this model or it’s code now that 3.0 is out?

    • http://www.brian-underwood.codes/ Brian Underwood

      Looking through the article again I think that everything should be fine the way it is. Some things you might want to know about Neo4j 3.0 with the neo4j.rb project:

      * Neo4j 3.0 requires parens around `MATCH` arguments, so we had to fix some compatibility issues there. Make sure you’re using the latest versions of `neo4j` and `neo4j-core`
      * Bolt isn’t yet supported, but I’m working on that currently. It will require `neo4j` 8.0 / `neo4j-core` 7.0 when they are released. Those new versions will have a few breaking changes, though I plan on releasing a blog post / docs page which lay out all of the things that will need to be adjusted

Recommended

Learn Coding Online
Learn Web Development

Start learning web development and design for free with SitePoint Premium!

Get the latest in Ruby, once a week, for free.