PHP and Neo4j: Introduction to Graph Databases
For a long time, data has been typically stored in tabular form so as to increase the indexing and readability. Nowadays, the trends are changing as Graph databases are quickly gaining popularity. In fact, it would not be wrong to call them "the future of DBMS".
New to the world of graphs and databases? Don't worry, by the end of this introductory article you will have sound theoretical knowledge about the topic – just enough to easily glide through the rest of the series – actual implementation.
What are Graphs?
Graphs are the most generic form of storing data in a visual manner in the world of data structures. Graphs store data in the form of nodes (data blocks) where one node points to another. We can reach any data block from another.
What are Graph Databases?
Technically, Graph Databases are a way of storing data in the form of nodes, edges and relationships which provide index-free adjacency.
Now let's understand this – Data is stored in the form of nodes, every node (or data block) is connected to another one and this connection is called an edge. A few words are also mentioned on these edges to further define the connection between one node and the other – this description is called a relationship. Since each node can directly look-up the node it is connected to (they are all connected through edges, remember?), this eliminates the need of searching a data block by its 'index', hence the term 'index-free adjacency'.
Today, most of the social networking sites like Facebook use graph databases to store their massive amount of data.
Usage of Graph databases
Graph databases find their usage when data to be stored is associative, meaning when the relationship between two data blocks matters a lot. Relational databases (tabular form) are not that good when a relationship exists between two data blocks, especially if it's the relationship that's more important than the actual data blocks. Graph databases are a very intuitive and expressive way to describe any form of data – as if we're writing something on whiteboards. They let you represent related data as it is – as a set of objects connected by a set of relationships each with its own set of descriptive properties.
We use them when…
…dealing with connected data. For eg., a social network platform – where along with personal details, the database depends on how two people are connected to each other because the amount of data visible between two friends will not be same if the two are not friends, but only acquaintances.
…traversal is preferred to indexing. This means the case where we would 'move through' a part of database instead of directly jumping to one. For eg., an approval for exporting a car from a factory requires that the car have an engine and upholstery (not really but bear with me). At the management level, the procedure will consist of checking all the cars which have the above-mentioned properties. If the company is using graph databases to store this data, they would check each node (car) which has both properties attached to them and if not, discard and move ahead.
…solving the traveling salesman problem. These are very common problems where the minimum possible route is to be calculated for reaching Point B from Point A when more than one option is available. Using various minimum distance algorithms one can find out the way with minimum cost if the destinations are stored in the form of graph databases.
…the next item to be searched for depends on the previous one. For eg., the database for a global enterprise which keeps the record of how a cookie is made from wheat. They will see how wheat flows through a farmer to a truck to a factory to a mixer to an oven to packaging and finally into a supermarket. Here all the stages may be connected to each other via nodes and thus moving from one to another would be easy.
In this example, Mike has a teacher named George, whose son Ryan is his friend. This has been represented by writing relationships on the edges and properties in the circle. This is very close to how the data is actually stored in a graph database. It would not have been that easy if we were using a table to depict such a relationship.
Real world use cases of graph databases
Graph databases are becoming very popular in the real world. Here are some arenas where graph databases have found their use among the world's leading companies:
Pagerank: Google uses the concept of graph databases in calculating the order of displaying the search results. A directed graph is used to connect the world wide web pages together as nodes and the hyperlinks to each other as the edges. The number of outgoing edges per graph is assigned as the weight for the edge. Thus, page rank is decided as per the weight on one edge as compared to other edges.
Data Management: Cisco, one of the world's leading networking organizations, has recently adopted a hierarchical management system which is centrally based on the graph utility of the Neo4j database. This provides them with a very fast access to data as compared to Oracle RAC. They are implementing this concept on product hierarchy too in order to serve the user in real time.
Social Interconnect : Websites like Facebook, Twitter, LinkedIn, Viadeo, Glassdoor are storing their connections in the form of graph databases as relationships. Recommendations are important from the point of view of their users. Relationships and connections can be very well managed and accessed in real time as compared to relational databases.
Network management : Telecommunication companies like SFR, Telenor, Huwai, JustDial have shifted to graph databases to model their network which consists of highly interconnected plans, customers and groups. Graphs help them in analyzing networks and data centers and also save them from the conventional time-consuming process of authentication. Most importantly, by using graphs, the failure cases are also covered and recovery plans are always just a node away which obviously saves a lot of time whenever any hazard occurs.
Security and access management : The creative cloud of Adobe uses a graph database structure to link authentication details and thereby grant access to contents for its administrators as well as users.
Bioinformatics : Era7 is a company that deals with DNA sequencing i.e., storing information on proteins, enzymes etc. This is done with the help of Bio4j, which is a bioinformatics graph DB system. It stores the information about genes, proteins and other complex interrelated information. Bio4j has all the features of Neo4j, world’s leading graph database, and is thus very scalable and flexible.
I hope you understand the theory behind graph databases a little better now. Coming soon, we'll be taking a detailed look at how to use Neo4j, the world's leading graph DB, with PHP.
If you'd like a particular use case covered, please mention it in the comments!