Discover Graph Databases with Neo4j and PHP
In this post, we’ll be learning about Neo4j, the leading graph database, and ways to use it with PHP. In a followup post, we’ll be building a proper graph application powered by Silex.
Graph databases are now one of the core technologies of companies dealing with highly connected data.
Business graphs, social graphs, knowledge graphs, interest graphs and media graphs are frequently in the (technology) news – and for a reason. The graph model represents a very flexible way of handling relationships in your data, and graph databases provide fast and efficient storage, retrieval and querying for it.
Neo4j, the most popular graph database, has proven its ability to deal with massive amounts of highly connected data in many use-cases.
During the last GraphConnect conference, TomTom and Ebay’s Shuttle demonstrated the value a graph database adds to your company to, for instance, provide fantastic customer experiences or to enable complex route-map editing. Neo4j is developed and supported by Neo Technology – a startup which has grown into a well respected database company.
A short Introduction
For the newcomers, here is a short introduction to graph databases and Neo4j, apart from the theoretical glance we threw at it last year.
What is a Graph ?
A graph is a generic data structure, composed of of nodes (entities) connected by relationships. Sometimes, those are also called vertices and edges. In the property graph model, each node and relationship can be labeled and hold any number of properties describing it.
image via Wikipedia
What is a Graph Database
A graph database is a database optimized for operations on connected data.
Graph databases provide high performance suitable for online operations by using dedicated storage structures for both nodes and relationships.
They don’t need to compute relationships (JOINS) at query time but store them efficiently as part of your data.
Let’s take a simple social application as an example, where users follow other users.
A user will be represented as a Node and can have a label and properties. Labels depict various roles for your nodes.
The link between these two users will be represented as a Relationship, which can also have properties and a Type to identify the nature of the relationship. Relationships add semantic meaning to your data.
Looking at the graph shows how natural it is to represent data in a graph and store it in a graph database.
Cypher, the Neo4j Graph Query Language
Querying a graph may not appear to be straightforward. To make it easy, Neo4j developed Cypher, a declarative graph query language, focused on readability and expressiveness for humans as developers, administrators and domain experts.
Being declarative, Cypher focuses on expressing what to retrieve from a graph, rather than how to retrieve it.
The query language is comprised of several distinct clauses. You can read more details about them in the Neo4j manual.
Here are a few clauses used to read and update the graph:
- MATCH: Finds the “example” graph pattern you provide in the graph and returns one path per found match.
- WHERE: Filters results with predicates, much like in SQL. There are many more predicates in Cypher though, including collection operations and graph matches.
- RETURN: Returns your query result in the form you need, as scalar values, graph elements or paths, or collections or even documents.
- CREATE: Creates graph elements (nodes and relationships) with labels and properties.
- MERGE: Matches existing patterns or create them. It’s a combination of
MATCH
andCREATE
.
Cypher is all about patterns, it describes the visual representation you’ve already seen as textual patterns (using ASCII-art).
It uses round parentheses to depict nodes (like (m:Movie)
or (me:Person:Developer)
) and arrows (like -->
or -[:LOVES]->
) for relationships.
Looking at our last graph of users, a query that will retrieve Hannah Hilpert and the users following her will be written like the following :
MATCH (user:User {name:'Hannah Hilpert'})<-[:FOLLOWS]-(follower)
RETURN user, follower
Neo4j and PHP
After this quick introduction to the Neo4j graph database (more here), let’s see how we can use it from PHP.
Neo4j is installed as a database server.
An HTTP-API is accessible for manipulating the database and issuing Cypher queries.
If you want to install and run the Neo4j graph database, you can download the latest version here : http://neo4j.com/download/, extract the archive on your computer and run the ./bin/neo4j start
command. Note that this is only for *nix based systems.
Neo4j comes with a cool visual interface, the Neo4j Browser available at http://localhost:7474.
Just try it! There are some guides to get started within the browser, but more information can be found online.
If you don’t want to install it on your machine, you can always create a free instance on GrapheneDB, a Neo4j As A Service provider.
The Neoxygen Components
Neoxygen is a set of open-source components, most of them in PHP, for the Neo4j ecosystem available on Github. Currently, I’m the main developer. If you are interested in contributing as well, just ping me.
A powerful Client for the Neo4j HTTP-API is named NeoClient, with multi-database support and built-in high availabililty management.
Installation and configuration
The installation is trivial, just add the neoclient
dependency in your composer.json
file :
{
"require": {
"neoxygen/neoclient":"~2.1"
}
}
You configure your connection when building the client :
use Neoxygen\NeoClient\ClientBuilder;
$client = ClientBuilder::create()
->addConnection('default', 'http', 'localhost', 7474)
->build();
If you created an instance on GrapheneDB, you need to configure a secure connection with credentials. This is done by appending true for using the auth mode and your credentials to the addConnection
method :
<?php
use Neoxygen\NeoClient\ClientBuilder;
$connUrl = parse_url('http://master.sb02.stations.graphenedb.com:24789/db/data/');
$user = 'master';
$pwd = 's3cr3tP@ssw0rd';
$client = ClientBuilder::create()
->addConnection('default', $connUrl['scheme'], $connUrl['host'], $connUrl['port'], true, $user, $password)
->build();
You have now full access to your Neo4j database with the client connecting to the HTTP API.
The library provides handy methods to access the different endpoints. However, the most frequently used method is sending a Cypher query.
Handling graph results in a raw json response is a bit cumbersome. That’s why the library comes with a handy result formatter that transforms the response into node and relationship objects. The formatter is disabled by default, and you can enable it by just adding a line of code into your client building process :
$client = ClientBuilder::create()
->addConnection('default', 'http', 'localhost', 7474)
->setAutoFormatResponse(true)
->build();
Let’s build something cool
We’re going to build a set of User nodes and FOLLOWS relationships incrementally. Then, we’ll be able to query friend-of-a-friend information to provide friendship suggestions.
The query to create a User is the following :
CREATE (user:User {name:'Kenneth'}) RETURN user
The query is composed of 5 parts :
- The CREATE clause (in blue), indicating we want to create a new element.
- The identifier (in orange), used to identify your node in the query
- The label (in red), used to add the user to the
User
labelled group. - The node properties (in green), are specific to that node.
- The RETURN clause, indicating what you want to return, here the created user.
You can also try to run that query in the Neo4j Browser.
No need to wait, let’s create this user with the client :
$query = 'CREATE (user:User {name:"Kenneth"}) RETURN user';
$result = $client->sendCypherQuery($query)->getResult();
You can visualize the created node in your browser (open the starred tab and run “Get some data”), or get the graph result with the client.
$user = $result->getSingleNode();
$name = $user->getProperty('name');
We will do the same for another user, now with query parameters. Query parameters are passed along with the query and it allows Neo4j to cache the query execution plan, which will make your further identical queries faster :
$query = 'CREATE (user:User {name: {name} }) RETURN user';
$parameters = array('name' => 'Maxime');
$client->sendCypherQuery($query, $parameters);
As you can see, parameters are embedded in {}
, and passed in an array of parameters as second argument of the sendCypherQuery
method.
If you look at the graph now, you’ll see the two User nodes, but they feel quite alone :( , no ?
Creating relationships
In order to create the relationships between our nodes, we’ll use Cypher again.
$query = 'MATCH (user1:User {name:{name1}}), (user2:User {name:{name2}}) CREATE (user1)-[:FOLLOWS]->(user2)';
$params = ['user1' => 'Kenneth', 'user2' => 'Maxime'];
$client->sendCypherQuery($query, $params);
Some explanations :
We first match for existing users named Kenneth and Maxime (names provided as parameters), and then we create a FOLLOWS
relationship between the two.
Kenneth will be the start node of the FOLLOWS
relationship and Maxime the end node.
The relationship type will be FOLLOWS.
Looking at the graph again shows that the relationship has been created.
Creating a bunch of users
Manually writing all the creation statements for a set of 100 users and the relationships would be boring.
I want to introduce a very useful tool called Graphgen
(one of the Neoxygen components) for generating graph data with ease.
It uses a specification that is very close to Cypher to describe the graph you want.
Here we’re going to create a set of 50 users and the corresponding FOLLOWS
relationships.
Go to http://graphgen.neoxygen.io , copy and paste the following pattern in the editor area, and click on Generate :
(user:User {login: userName, firstname: firstName, lastname: lastName} *50)-[:FOLLOWS *n..n]->(user)
You can see that it automatically generates a graph with 50 users, the relationships, and realistic values for login, firstname and lastname. Impressive, no?
Let’s import this graph into our local graph database, click on Populate your database and use the default settings.
In no time, the database will be populated with the data.
If you open the Neo4j browser, and run “Get some data” again, you can see all the user nodes and their relationships.
Getting suggestions
Getting suggestions with Neo4j is simple, you just need to match one user, follow the FOLLOWS relationships to the other users, then for each found user, find the users they follow and return those that you do not follow already. The suggestion also must not be the user for whom we are looking for suggestions.
In a common application, there will be a login system and the user will be only allowed to see the users he is following. For the sake of this post which is introducing you Neo4j, you’ll be able to play with all the users.
Let’s write it in Cypher :
$query = 'MATCH (user:User {firstname: {firstname}})-[:FOLLOWS]->(followed)-[:FOLLOWS]->(suggestion)
WHERE user <> suggestion
AND NOT (user)-[:FOLLOWS]->(suggestion)
RETURN user, suggestion, count(*) as occurrence
ORDER BY occurrence DESC
LIMIT 10';
$params = ['firstname' => 'Francisco'];
$result = $client->sendCypherQuery($query, $params)->getResult();
$suggestions = $result->get('suggestion'); // Returns a set of nodes
If you run this query in the neo4j browser, you’ll get your first matched user and the suggestions :
Conclusion
In this part:
- You’ve discovered graph databases and Neo4j
- You learned the basics of the Cypher Query Language
- You’ve seen how to connect to and run queries on a Neo4j database with PHP
In a followup article we’ll use everything we’ve learned so far and make a real Neo4j powere Silex PHP application.