PHP and Neo4j: Introduction to Graph Databases

Key Takeaways

Graph databases, which store data in nodes, edges, and relationships, are gaining popularity due to their ability to handle associative data and complex relationships more intuitively and expressively than traditional tabular databases.
Graph databases are used in scenarios where connected data, traversal, and dependency on previous data items are important. They are used by major companies like Google, Facebook, Twitter, and Cisco for tasks like page ranking, data management, social interconnectivity, and network management.
Neo4j, the world’s leading graph database, offers robust features, performance, and flexibility, making it suitable for large-scale applications. It can be used with PHP and provides a powerful query language, Cypher, designed specifically for querying graph data.

For a long time, data has been typically stored in tabular form so as to increase the indexing and readability. Nowadays, the trends are changing as Graph databases are quickly gaining popularity. In fact, it would not be wrong to call them “the future of DBMS”.

New to the world of graphs and databases? Don’t worry, by the end of this introductory article you will have sound theoretical knowledge about the topic – just enough to easily glide through the rest of the series – actual implementation.

What are Graphs?

Graphs are the most generic form of storing data in a visual manner in the world of data structures. Graphs store data in the form of nodes (data blocks) where one node points to another. We can reach any data block from another.

What are Graph Databases?

Technically, Graph Databases are a way of storing data in the form of nodes, edges and relationships which provide index-free adjacency.

Now let’s understand this – Data is stored in the form of nodes, every node (or data block) is connected to another one and this connection is called an edge. A few words are also mentioned on these edges to further define the connection between one node and the other – this description is called a relationship. Since each node can directly look-up the node it is connected to (they are all connected through edges, remember?), this eliminates the need of searching a data block by its ‘index’, hence the term ‘index-free adjacency’.

Today, most of the social networking sites like Facebook use graph databases to store their massive amount of data.

Usage of Graph databases

Graph databases find their usage when data to be stored is associative, meaning when the relationship between two data blocks matters a lot. Relational databases (tabular form) are not that good when a relationship exists between two data blocks, especially if it’s the relationship that’s more important than the actual data blocks. Graph databases are a very intuitive and expressive way to describe any form of data – as if we’re writing something on whiteboards. They let you represent related data as it is – as a set of objects connected by a set of relationships each with its own set of descriptive properties.

We use them when…

…dealing with connected data. For eg., a social network platform – where along with personal details, the database depends on how two people are connected to each other because the amount of data visible between two friends will not be same if the two are not friends, but only acquaintances.
…traversal is preferred to indexing. This means the case where we would ‘move through’ a part of database instead of directly jumping to one. For eg., an approval for exporting a car from a factory requires that the car have an engine and upholstery (not really but bear with me). At the management level, the procedure will consist of checking all the cars which have the above-mentioned properties. If the company is using graph databases to store this data, they would check each node (car) which has both properties attached to them and if not, discard and move ahead.
…solving the traveling salesman problem. These are very common problems where the minimum possible route is to be calculated for reaching Point B from Point A when more than one option is available. Using various minimum distance algorithms one can find out the way with minimum cost if the destinations are stored in the form of graph databases.
…the next item to be searched for depends on the previous one. For eg., the database for a global enterprise which keeps the record of how a cookie is made from wheat. They will see how wheat flows through a farmer to a truck to a factory to a mixer to an oven to packaging and finally into a supermarket. Here all the stages may be connected to each other via nodes and thus moving from one to another would be easy.

Example

In this example, Mike has a teacher named George, whose son Ryan is his friend. This has been represented by writing relationships on the edges and properties in the circle. This is very close to how the data is actually stored in a graph database. It would not have been that easy if we were using a table to depict such a relationship.

Real world use cases of graph databases

Graph databases are becoming very popular in the real world. Here are some arenas where graph databases have found their use among the world’s leading companies:

Pagerank: Google uses the concept of graph databases in calculating the order of displaying the search results. A directed graph is used to connect the world wide web pages together as nodes and the hyperlinks to each other as the edges. The number of outgoing edges per graph is assigned as the weight for the edge. Thus, page rank is decided as per the weight on one edge as compared to other edges.
Data Management: Cisco, one of the world’s leading networking organizations, has recently adopted a hierarchical management system which is centrally based on the graph utility of the Neo4j database. This provides them with a very fast access to data as compared to Oracle RAC. They are implementing this concept on product hierarchy too in order to serve the user in real time.
Social Interconnect : Websites like Facebook, Twitter, LinkedIn, Viadeo, Glassdoor are storing their connections in the form of graph databases as relationships. Recommendations are important from the point of view of their users. Relationships and connections can be very well managed and accessed in real time as compared to relational databases.
Network management : Telecommunication companies like SFR, Telenor, Huwai, JustDial have shifted to graph databases to model their network which consists of highly interconnected plans, customers and groups. Graphs help them in analyzing networks and data centers and also save them from the conventional time-consuming process of authentication. Most importantly, by using graphs, the failure cases are also covered and recovery plans are always just a node away which obviously saves a lot of time whenever any hazard occurs.
Security and access management : The creative cloud of Adobe uses a graph database structure to link authentication details and thereby grant access to contents for its administrators as well as users.
Bioinformatics : Era7 is a company that deals with DNA sequencing i.e., storing information on proteins, enzymes etc. This is done with the help of Bio4j, which is a bioinformatics graph DB system. It stores the information about genes, proteins and other complex interrelated information. Bio4j has all the features of Neo4j, world’s leading graph database, and is thus very scalable and flexible.

Conclusion

I hope you understand the theory behind graph databases a little better now. Coming soon, we’ll be taking a detailed look at how to use Neo4j, the world’s leading graph DB, with PHP.

If you’d like a particular use case covered, please mention it in the comments!

Frequently Asked Questions about PHP and Neo4j: An Introduction to Graph Databases

What is the main difference between graph databases and relational databases?

Graph databases and relational databases are both types of databases used to store and manage data. However, they differ in their structure and how they handle relationships between data. A graph database, like Neo4j, uses nodes and relationships to represent data, which allows for more complex and interconnected data structures. On the other hand, a relational database uses tables to store data and relationships are established through primary and foreign keys. This makes graph databases more suitable for handling complex, interconnected data, while relational databases are better for structured, tabular data.

How does Neo4j handle relationships between data?

Neo4j, as a graph database, handles relationships between data through nodes and relationships. Nodes represent entities or instances, while relationships connect these nodes and provide context. Unlike in relational databases where relationships are inferred through keys, relationships in Neo4j are first-class citizens, meaning they are as important as the data itself. This allows for more efficient querying and handling of complex, interconnected data.

What are the advantages of using Neo4j over other graph databases?

Neo4j stands out among other graph databases due to its robust features and performance. It offers a flexible schema, which allows for easy adaptation to changes in data structure. It also has a powerful query language, Cypher, which is specifically designed for querying graph data. Additionally, Neo4j has strong community support and extensive documentation, making it a popular choice among developers.

Can I use PHP with Neo4j?

Yes, you can use PHP with Neo4j. There are several libraries available that allow you to interact with Neo4j from a PHP application. One of these is the Neo4j PHP Client, which provides a simple and flexible API for working with Neo4j.

How does the performance of graph databases compare to relational databases?

The performance of graph databases and relational databases can vary depending on the specific use case. In general, graph databases like Neo4j perform better when dealing with complex, interconnected data. This is because they can traverse relationships between data more efficiently than relational databases. However, for simple, tabular data, relational databases may offer better performance.

What is Cypher and how is it used in Neo4j?

Cypher is a declarative graph query language developed by Neo4j. It is designed to be intuitive and easy to use, with a syntax that closely resembles English. With Cypher, you can query and manipulate data in Neo4j using patterns and predicates. It is a powerful tool for working with graph data and is one of the key features of Neo4j.

Is Neo4j suitable for large-scale applications?

Yes, Neo4j is suitable for large-scale applications. It is designed to handle large volumes of data and complex queries efficiently. It also offers features like ACID transactions, clustering, and high availability, which are essential for large-scale applications.

How secure is Neo4j?

Neo4j takes security seriously and offers several features to ensure the safety of your data. These include role-based access control, encryption, and auditing capabilities. However, like any software, the security of your Neo4j application also depends on how it is configured and used.

Can I migrate my existing relational database to Neo4j?

Yes, it is possible to migrate your existing relational database to Neo4j. There are tools and guides available that can help you with this process. However, it’s important to note that due to the different structures of graph and relational databases, some changes to your data model may be necessary.

What resources are available for learning more about Neo4j?

There are many resources available for learning more about Neo4j. The official Neo4j website offers extensive documentation, tutorials, and guides. There are also many books, online courses, and community forums where you can learn more about Neo4j and graph databases in general.