A Look at OrientDB: The Graph-Document NoSQL

Glenn Goodrich
Ruby Editor

orient_sun_logoIn this post, I am going to give you a brief introduction to a OrientDB. In later posts in this series, I will take you through using OrientDB with Ruby.

You have probably heard of document databases, like MongoDB, and you may have heard of graph databases, like Neo4j. I am willing to bet you have not heard of very many, if any, Graph-Document databases.

That’s right. OrientDB touts itself as “The Graph-Document NoSQL.” From the website:

OrientDB is an Open Source GraphDB with a mix of features taken from Document Databases and Object Orientation.

Sound interesting? Or crazy? Or both?

We found OrientDB after going fairly far down the path with Neo4j[1], only to find out that Neo4j’s pricing for its “enterprise” features (namely, HA) was beyond our startup-limited reach. OrientDB’s has an Apache2 license which is more permissive than Neo4j’s.[2]

I love the GraphDB concepts, such as directed property relationships as well as the flexibility that comes with the graph approach. If you are unfamiliar with graph database concepts, check out these two posts (one and two) by Thiago Jackiw.

If you need a primer on document databases, check out this post from 10gen.

In essence, OrientDB holds the promise of a flexible schema from its Document DB roots, along with relationships as first-class citizens, like a graph database. A good example of both worlds is how OrientDB supports Embedded and Referenced Relationships. The former are contained inside the record and only accessible via that container. Referenced relationships are like edges from the Graph DB world, accessible as first-class objects with a start vertex, end vertex, and properties.

Orient has some impressive claims, such as more than 150,000 inserts/second and already has a cloud service behind it in Nuvolabase. In fact, Nuvolabase offers a demo of the “studio” app that comes with OrientDB, which you can see here.

Not Made in America

If you looked at the Nuvolabase pricing, you probably noticed that it’s not in good ol’ ‘Merican dollars. Currently, it seems the vast majority of OrientDB customers are in Europe. Also, the company behind OrientDB, Orient Technologies, is in London. I only mention it because 1) I am in the States, and 2) I’ve not heard much about OrientDB before I stumbled upon it. Other than that, I don’t really see it as an issue.

Installation

Enough background, let’s install OrientDB and play with it. OrientDB is written in Java, so you should be able to run it anywhere. It comes with a server, console, and Gremlin console. Don’t worry about Gremlin right now, I’ll talk about it later.

The available versions to download are here. Currently, I would recommend the 1.5.0 Stable release. Just click the big green button on that page to get a zip of the release.

big_green

Extract that file to a directory, which I’ll presume is named orientdb. The directory structure of OrientDB is fairly typical:

dirs

I’ll only touch on the config, bin and databases directories today.

Config

There are several files in the config directory. We just want to run the “standard” server, so open up the orientdb-server-config.xml file.

config

Remember XML? It is still XML and it still dominates Java configuration. Regardless, there are bunch of settings in here that you can research on your own (or I’ll post about later, perhaps). For now, scroll down until you see <users> section.

This section defines (you guessed it) the users that can access the server. By default, OrientDB provides a “root” and “guest” user. I like to add my own user with a much more reasonable password, so feel free to add:

<user resources="*" password="password" name="user"/>

The resources attribute controls what the user can do. A value of * makes that user all-powerful, so govern yourself accordingly. There is some documentation on the OrientDB wiki, but it doesn’t feel complete.

Save the config file and close it.

Databases

The databases directory holds (you guessed it again) the databases served by this server. Any databases in this directory are visible to the server without configuration. When you create a new database, a new folder will be created in this directory with the same name as your database. We’ll do that in a bit.

This directory is, however, simply the default directory for databases. You can add other storage sites via configuration.

OrientDB supports three storage “engines”: memory, local, and plocal. Memory is self-explanatory.

The local and plocal storage types use the filesystem. Local is the “old way,” it seems, and is the most feature rich right now. Plocal is the “new way,” but is missing some big-time features, such as transactions. You can read more here.

OrientDB ships with a demo database called tinkerpop, which you can play with when we fire up the server.

Bin: Fire it Up

The bin directory, as is so often the case, has the fun stuff. Namely, it contains scripts to launch the various types of servers and consoles. Today, we’ll just launch the vanilla standalone server, using server.sh (or server.bat if you are on Windows).

Note: I had to chmod +x *.sh inside that directory, as all my shell scripts were not executable.

Once you fire up the server, you’ll see something like:

server

Ahhh…don’t you love ASCII art? I do. As long as you don’t see any errors, everthing is ready to go. OrientDB comes with a handy web application where you can browse the database, create classes, records, indices, etc. By default, it lives at http://localhost:2480. When you visit that page, it will ask you to login. Be sure to use the username and password we defined in the configuration above. You should only have the tinkerpop database in the database drop-down right now.

Once logged in, you should see:

studio

The Studio app starts you off on the “Schema” tab in the “Database” section. This shows the classes available along with some other metadata. The important ones to notice are V and E, which stand for Vertex and Edge, respectively. These are the base classes of your graph database, and when you create a new class (also knows as a “vertex type” or “edge type”) it will inherit from one of these classes.

The tinkerpop database does not have any vertex subclasses, but it does have three edge subclasses: followed_by,, sung_by, and written_by. The indication that they are an edge subclass is the ‘E’ value in the superclass column. Vertex subclasses, as you probably guessed, have a ‘V’ in that column. Oh, and you can create subclasses of subclasses.

You can think of classes much like classes in the Object-Oriented world. They define a “type of record.” As a brief demo, select the “V” class in the table and click the “Query” button. You’ll be taken to the Query page, which looks like:

v

Here you can see the records and properties of the V class. Any attribute that starts with a ‘@’ (so, @rid, @version, and @class) is a OrientDB system property, meaning all classes have them. The rest are either user defined or defined as a part of a relationship. The relationship properties start with in_ or out_, each holding the ids of the edge records coming into or out of each vertex for each relationship type.

Your first look at the OrientDB can be, well, disorienting. The relationship properties and odd ids (like, “#11:0″) can take you right out of your comfort zone. A quick chat about how the data is structured in OrientDB may help.

Structure

As previously mentioned, OrientDB breaks the data into OO-like classes. These classes have inheritence and, in the graph world, can be either a (V)ertex or an (E)dge subclass. Each class has one or more clusters, which are “a generic way to group records”. You can group a class into clusters based on attribution.

For example, if you had an Invoice class, you could group the 2012 invoices into a Invoice2012 cluster and the 2013 invoices into a Invoice2013 cluster. You specify which cluster to use for a given record when you create that record. Every class has at least one default physical cluster that is used if none is specified on record creation.

The point of clusters is to group data that you will want to query together. We haven’t done much with them yet, but plan to use them extensively as we grow with OrientDB.

Records are what you would expect: an instance of a class. They are documents in the Document DB sense, as well as nodes in the Graph DB sense. Records live in a cluster and have the schema defined by the class. Classes, Clusters, and Records are the lion’s share of OrientDB’s data structure.

Relationships, as I’ve alluded to, are the “edges” of the property graph and are first-class citizens. They have direction (meaning, an “in” vertex and an “out” vertex) and can have properties. Much of a graph database’s speed comes from the ease of traversing the graph from vertex to edge to vertex, etc. which is why graphs are the choice of most social networking sites. These kinds of relationships are meant to avoid the “join pain” of the traditional RDBMS, where the weight of millions of joins cause massive performance issues.

Playing with Data

The bin directory also has a console script (either .sh or .bat, depending on your OS). Open a new terminal and fire up the console.

console

OrientDB has an extensive console command list. Today, we’ll create a database, add a vertex type, create some data, then query for that data.

Create a Database

create database plocal:databases/sitepoint-test user pass plocal graph

(Note: I started my console.sh from the root of the OrientDB installation, so the relative path to my databases folder is reflected above. If you started the console from the bin dir, you’ll need to put in a relative or absolute path to get the database to be created in the right place.)

db

Notice that the console has made this new database “current.” This means commands and queries will be run against this database.

Create a Vertex Type and Vertex

Let’s create a “Person” vertex type and make two people.

create class Person extends V
=> Class created successfully. Total classes in database now: 11

Remember, a vertex type is just a subclass of Vertex.

Now, we can add a property .

create property Person.name string
=> Property created successfully with id=1

And now, create our lovers.

create vertex Person set name='Joanie'
=> Created vertex 'Person#11:0{name:Joanie} v0' in 0.076000 sec(s).

create vertex Person set name='Joanie'
=> Created vertex 'Person#11:1{name:Chachie} v0' in 0.001000 sec(s).

Create an Edge Type and Edge

Creating our edge/relationship is very similar to creating our vertex type and vertices. However, you don’t have to create an edge type if your relationship is not going to have any properties. We’ll keep it simple today and just mimic what we did with vertices.

create class loves extends E
=> Class created successfully. Total classes in database now: 12

create edge loves from #11:1 to #11:0
=> Created edge '[loves{in:#11:0,out:#11:1}]' in 0.003000 sec(s).

Querying

select from Person

----+-----+-------
#   |@RID |name
----+-----+-------
0   |#11:0|Joanie
1   |#11:1|Chachie
----+-----+-------

select name as subject, out_loves.name as loves from person

----+-----+-------+------
#   |@RID |subject|loves
----+-----+-------+------
0   |#-2:1|Joanie |null
1   |#-2:2|Chachie|Joanie
----+-----+-------+------

You’ll notice that OrientDB has a very SQL-ish syntax, which is nice when you’re trying to digest all the new concepts

There is tons more you can do, obviously. This example serves to whet your whistle.

Conclusion

That’s as far as I want to go with OrientDB today. In the next post, I’ll talk about the orientdb-jruby gem as well as a new gem (codename: “oriented”) we are writing as a part of our work. Until then, explore the OrientDB console and studio application and check out the Google Group (linked below).

Resources


1: Which is really, really great. Really, if it weren’t for budget issues, I probably would not have looked beyond Neo4j.

2: To be fair, Neo4j licenses their commercial and enterprise offerings. In my opinion, you couldn’t really deploy a production app without their enterprise features. OrientDB also charges for their “enterprise” offering, as shown here, but it doesn’t remove any of the features that allow scaling. This is not a post about Neo4j vs OrientDB, so that’s all I have to say about that.

Win an Annual Membership to Learnable,

SitePoint's Learning Platform

  • dimitri

    Awesome! I can’t wait for the “oriented” gem!

  • Anonymous

    Very cool. What are your plans for “oriented”?

  • Anonymous

    Oriented will be another layer on top of your gem, making it a bit easier to use it in, say, a Rails app. However, we are not going down the route of using AR now. Basically, we want it to have feature parity with the neo4j gem, which is gonna take awhile.

  • Josep

    I want to test a graph database for a project and after reading web pages and blogs I’ve decided to use neo4j or OrientDB and now I’m doing a closer look to them.
    So as you’ve worked with both of them I would like, if it’s possible, that you compare them.
    I’m not asking for a feature comparation, neither a performance benchmark. I would be interested more on subjective experience on what you like and dislike of them.

  • Anonymous

    Josep, we switched almost entirely due to the license issues. If you are going to do anything significant with Neo4j, you’ll likely need HA, and that is big time $$$. Other than that:

    – The community around Neo4j is more mature, especially in the states. You can get classes, etc from Neotechnology pretty close to you home, I’d guess. Also, many of the Neo4j devs are accessible via groups/skype/etc. It’s a good community. OrientDB, basically, has a Google Group. The devs are responsive, but it’s not as lively.
    – This is also true of the gems. If you are looking at using Ruby, the Neo4j ruby gems are very nice. For OrientDB, we had to kind of resuscitate the orientdb-jruby gem (it still used the 1.3.0 jars, we upgraded it to 1.5.0 and added some more Tinkerpop jars) We’re also writing the oriented gem to try and reach some kind of feature parity with the Neo4j gems.
    – They both use Gremlin, but Neo4j also uses Cypher, which is very nice. OrientDB has a SQL-ish query language, which we are still learning. I miss Cypher.
    – OrientDB has a binary protocol, and Neo4j does not. This is a big one for us, as it means we can have >1 clients on an instance without dropping to a REST api or using HA.

    I like them both. If I had the $$, I probably would still be on Neo4j. However, I am happy (so far) on OrientDB and the saved $$$ is a boon.

    I am sure there are other things, but I hope that answers it for you.

  • Josep

    Thank you very much Glenn for you answer, you said some things I wasn’t aware of and you also confirmed others thoughts I had. I really apreciate your comments on community arround them, Gremlin, Cypher and binary protocol.

    I’m not using Ruby, but I haven’t decided what language to use yet. In fact, my concert now is how I’ll import all the data (with a good performance) from Oracle to test if a graph DB could find a solution to a problem that’s hard to solve with a relational database. If it works, it won’t be a HA critical system, just a batch process, so it would be fine no HA, but having it would be nice.

  • Philip

    Neo4j does have a Neo4j Enterprise for Startups offering! It’s probably not communicated in the best way – pricing isn’t published on the web site for various reasons. But quite a few startups are using it, and it’s a super friendly way to get your startup up and running with Neo4j for startup prices. Best way to reach out about this is through here: http://info.neotechnology.com/startups.html

  • Devver

    You’re either pregnant or you’re not. You either support startups or you don’t. If you’re hiding it on the website, it means you intend to bilk the startup ultimately. Hence open source options like OrientDB or ArangoDB are a better fit for FOSS startups.