PHP Master | Modeling Data Relationships in MongoDB

It’s important to consider the data needs of your application right from the very start of development. But if your app will be using NoSQL and you come from a RDBMS/SQL background, then you might think looking at data in terms of NoSQL might be difficult. This article will help you by showing you how some of the basic data modeling concepts apply in the realm of NoSQL.

I’ll be using MongoDB for our discussion as it is one of the leading open-source NoSQL databases due to its simplicity, performance, scalability, and active user base. Of course the article assumes that you are aware of basic MongoDB concepts (like collections and documents). If not, I suggest you read some of the previous articles here at SitePoint to get started with MongoDB.

Key Takeaways

Understanding the type of data relationships that exist in your application is crucial to effective NoSQL data modeling. These relationships can be one-to-one, one-to-many, or many-to-many.
One approach to model these relationships in MongoDB is using embedded documents, where related pieces of information are stored within a single document. This approach enhances read performance and ensures atomicity in operations.
Another approach is using referenced documents, where a document stores references to related documents in other collections. This approach is preferred when embedded documents would be too large, when there would be a lot of duplicate data, or when modeling complex or hierarchical relationships.
The choice between using embedded or referenced documents depends on the specific requirements of your application. If the application is read-heavy and the number of related documents is not very large, embedded documents may be the best choice. Conversely, if the application is write-heavy or the number of related documents is large, using referenced documents may be more suitable.

Understanding Relationships

Relationships show how your MongoDB documents are related to each other. To understand the different ways in which we can organize our documents, let’s look at the possible relationships.

1-to-1 Relationship (1:1)

A 1:1 relationship exists when one object of an entity is related to one and only one object of another entity. For example, one user can have one and only one birth date. So if we have a document which stores user information and another document which stores birth dates, there will be a 1:1 relationship between them.

mongo-model-01

1-to-Many Relationship (1:N)

Such relationship exists when one object of an entity can be related to many objects of another entity. For example, there can be 1:N relationship between a user and his contact numbers since it is possible for one user to have more than one number.

mongo-model-02

Many-to-Many Relationship (M:N)

Such relationship exists when one object of an entity is related to many objects of another entity and vice versa. If we correlate it to the case of users and items bought by them, one user can purchase more than one items as well as one item can be purchased by more than one user.

mongo-model-03

Modeling 1-to-1 Relationships (1:1)

Consider the following example where we need to store address information for each user (for now let’s assume that there’s a single address for each user). In this case, we can design an embedded document having the following structure:

{
    "_id": ObjectId("5146bb52d8524270060001f4"),
    "user_name": "Mark Benzamin"
    "dob": "12 Jan 1991",
    "address": {
       "flat_name": "305, Sunrise Park",
       "street": "Cold Pool Street",
       "city": "Los Angeles"
    }
}

We have the address entity embedded within the user entity making all of the information present in a single document. This means we can find and retrieve everything with a single query.

<?php
// query to find user 'Mark Benzamin' and his address
$cursor = $collection->find(
    array("user_name" => "Mark Benzamin"),
    array("user_name" => 1,"address" => 1)
 );

Embedding documents is roughly analogous to de-normalization and is useful when there is a “contains” relationship between two entities. That is, one document can be stored within another, thus placing the related pieces of information within a single document. As all the information is available in one document, such approach has better read performance because a query operation within a document is less expensive for the server and we find and retrieve the related data in the same query.

In contrast, a normalized approach would call for two documents (ideally in separate collections), one to store basic user information and another to store address information. The second document would contain a user_id field indicate the user to which the address belongs.

{
    "_id": ObjectId("5146bb52d8524270060001f4"),
    "user_name": "Mark Benzamin",
    "dob": "12 Jan 1991"
}

{
    "_id": ObjectId("5146bb52d852427006000de4"),
    "user_id": ObjectId("5146bb52d8524270060001f4"),
    "flat_name": "305, Sunrise Park",
    "street": "Cold Pool Street",
    "city": "Los Angeles"
}

We now need to execute two queries to fetch the same data:

<?php
// query to find user information
$user = $collection->findOne(
    array("user_name" => "Mark Benzamin"),
    array("_id" => 1, "user_name" => 1)
);
// query to find address corresponding to user
$address = $collection->findOne(
    array("user_id" => $user["_id"]),
    array("flat_name" => 1, "street" => 1, "city" => 1)
);

The first query fetches the _id of the user which is then used in the second query to retrieve his address information.

The embedding approach makes more sense than the referencing approach in this case since we are frequently retrieving user_name and address together. What approach you should use ultimately depends on how you logically connect your entities and what data you need to retrieve from the database.

Modeling Embedded 1-to-Many Relationships (1:N)

Now let’s consider the case where one user can have multiple addresses. If all of the addresses should be retrieved along with the basic user information, it would be ideal to embed the address entities inside the user entity.

{
    "_id": ObjectId("5146bb52d8524270060001f4"),
    "user_name": "Mark Benzamin"
    "address": [
        {
            "flat_name": "305, Sunrise Park",
            "street": "Cold Pool Street",
            "city": "Los Angeles"
        },
        {
            "flat_name": "703, Sunset Park",
            "street": "Hot Fudge Street",
            "city": "Chicago"
        }
    ]
}

We’re still able to fetch all of the required information with a single query. The referencing/normalized approach would have us design three documents (one user, two addresses) and two queries to accomplish the same task.
Besides efficiency and convenience, we should use the embedded approach in instances where we need atomicity in operations. Since any updates happen within the same document, atomicity is always insured.

Modeling Referenced 1-to-Many Relationships (1: N)

Bear in mind that embedded documents can continue to grow in size over the life of an application which can badly impact write performance. There’s also a limit of 16MB on the maximum size of each document. A normalized approach is preferred if the embedded documents would be too large, when the embedding approach would result in a lot of duplicate data, or if you need to model complex or hierarchical relationships between documents.

Consider the example of maintaining posts made by a user. Let’s assume that we want the user’s name and his profile picture with every post (similar to a Facebook post where we can see a name and profile picture with every post). The denormalized approach would store the user information in each post document:

{
    "_id": ObjectId("5146bb52d8524270060001f7"),
    "post_text": "This is my demo post 1",
    "post_likes_count": 12,
    "user": {
        "user_name": "Mark Benzamin",
        "profile_pic": "markbenzamin.jpg"
    }
}
{
   "_id": ObjectId("5146bb52d8524270060001f8"),
   "post_text": "This is my demo post 2",
   "post_likes_count": 32,
    "user": {
        "user_name": "Mark Benzamin",
        "profile_pic": "markbenzamin.jpg"
    }
}

We can see this approach stores redundant information in each post document. Looking a bit forward, if the user name or profile picture was ever changed, we would have to update the appropriate field in all of the corresponding posts.
So the ideal approach would be to normalize the information and connect it via references.

{
    "_id": ObjectId("5146bb52d852427006000121"),
    "user_name": "Mark Benzamin",
    "profile_pic": "markbenzamin.jpg"
}

{
    "_id": ObjectId("5146bb52d8524270060001f7"),
    "post_text": "This is my demo post 1",
    "post_likes_count": 12,
    "user_id": ObjectId("5146bb52d852427006000121")
}
{
    "_id": ObjectId("5146bb52d8524270060001f8"),
    "post_text": "This is my demo post 2",
    "post_likes_count": 32,
    "user_id": ObjectId("5146bb52d852427006000121")
}

The user_id field in the post document contains a reference to the user document. Thus, we can fetch posts made by the user using two queries as follows:

<?php
$user = $collection->findOne(
    array("user_name" => "Mark Benzamin"),
    array("_id" => 1, "user_name" => 1, "profile_pic" => 1)
); 
$posts = $collection->find(
    array("user_id" => $user["_id"])
);

Modeling Many-to-Many Relationships (M:N)

Let’s take our previous example of storing users and the items they purchased (ideally in separate collections) and design referenced documents to illustrate the M:N relationship. Assume the collection which store documents for user information is as follows such that each document contains reference IDs for the list of items purchased by the user.

{
    "_id": "user1",
    "items_purchased": {
        "0": "item1",
        "1": "item2"
    }
}
{
    "_id": "user2",
    "items_purchased": {
        "0": "item2",
        "1": "item3"
    }
}

Similarly, assume another collection storing the documents for the available items. These documents will in turn store reference IDs for the list of users who have purchased it.

{
    "_id": "item1",
    "purchased_by": {
        "0": "user1"
    }
}
{
    "_id": "item2",
    "purchased_by": {
        "0": "user1",
        "1": "user2"
    }
}
{
    "_id": "item3",
    "purchased_by": {
        "0": "user2"
    }
}

To fetch all the items purchased by a user, we would write the following query:

<?php
// query to find items purchased by a user
$items = $collection->find(
    array("_id" => "user1"),
    array("items_purchased" => 1)
);

The above query will return the IDs of all the items purchased by user1. We can later use these to fetch the corresponding item information.

Alternatively, if we wanted to fetch the users who have purchased a specific item, we would write the following:

<?php
// query to find users who have purchased an item
$users = $collection->find(
    array("_id" => "item1"),
    array("purchased_by" => 1)
);

The above query returns the IDs of all the users who have purchased item1. We can later use these IDs to fetch the corresponding user information.

This example demonstrated M:N relationships which are very useful in some cases. However, you should keep in mind that many times such relationships can be handled using 1:N relationships along with some smart queries. This reduces the amount of data to be maintained in both documents.

Conclusion

That’s it for this article. We’ve learned about some basic modeling concepts which will surely help you head start your own data modeling: 1-to-1, 1-to-many, and many-to-many relationships, and also a little about data normalization and de-normalization. You should be able to easily apply these concepts to your own application’s modeling needs. If you have any questions or comments related to the article, feel free to share in the comments section below.

Frequently Asked Questions about Modeling Data Relationships in MongoDB

What are the different types of data relationships in MongoDB?

In MongoDB, there are three main types of data relationships: one-to-one, one-to-many, and many-to-many. A one-to-one relationship is when a single document in a collection relates to a single document in another collection. A one-to-many relationship is when a single document in a collection relates to multiple documents in another collection. A many-to-many relationship is when multiple documents in a collection relate to multiple documents in another collection. These relationships can be modeled using embedded documents or reference documents.

How do you model a one-to-many relationship in MongoDB?

A one-to-many relationship in MongoDB can be modeled in two ways: using embedded documents or reference documents. In the embedded documents approach, the related documents are stored directly within the parent document. This approach is suitable when the related documents are not frequently updated and the number of related documents is not very large. In the reference documents approach, the parent document stores references (usually the _id field) to the related documents. This approach is suitable when the related documents are frequently updated or the number of related documents is large.

How do you model a many-to-many relationship in MongoDB?

A many-to-many relationship in MongoDB can be modeled using reference documents. Each document in the collections that are part of the relationship stores references (usually the _id field) to the related documents in the other collection. This approach allows for efficient querying and updating of the related documents.

What are the advantages and disadvantages of using embedded documents in MongoDB?

The main advantage of using embedded documents in MongoDB is that it allows for efficient querying because all the related data is stored together. This can significantly improve performance for read-heavy applications. However, the main disadvantage is that it can lead to data duplication and increased storage usage, especially if the related documents are frequently updated or the number of related documents is large.

What are the advantages and disadvantages of using reference documents in MongoDB?

The main advantage of using reference documents in MongoDB is that it avoids data duplication and reduces storage usage. This can significantly improve performance for write-heavy applications. However, the main disadvantage is that it requires additional queries to retrieve the related data, which can decrease performance for read-heavy applications.

How do you choose between using embedded documents and reference documents in MongoDB?

The choice between using embedded documents and reference documents in MongoDB depends on the specific requirements of your application. If your application is read-heavy and the number of related documents is not very large, then using embedded documents may be the best choice. On the other hand, if your application is write-heavy or the number of related documents is large, then using reference documents may be the best choice.

How do you model hierarchical data in MongoDB?

Hierarchical data in MongoDB can be modeled using the parent reference pattern or the child reference pattern. In the parent reference pattern, each child document stores the _id of its parent document. This pattern is suitable for trees that are read-heavy and write-light. In the child reference pattern, each parent document stores the _ids of its child documents. This pattern is suitable for trees that are write-heavy and read-light.

How do you model polymorphic relationships in MongoDB?

Polymorphic relationships in MongoDB can be modeled using reference documents. Each document stores a reference (usually the _id field) to the related document and the type of the related document. This approach allows for efficient querying and updating of the related documents, regardless of their type.

How do you model denormalized data in MongoDB?

Denormalized data in MongoDB can be modeled using the denormalization pattern. In this pattern, some or all of the data from the related documents is duplicated in the parent document. This approach allows for efficient querying because all the related data is stored together. However, it can lead to data duplication and increased storage usage.

How do you model large hierarchical data in MongoDB?

Large hierarchical data in MongoDB can be modeled using the materialized paths pattern or the array of ancestors pattern. In the materialized paths pattern, each document stores the path from the root of the tree to the document. This pattern allows for efficient querying of the tree. In the array of ancestors pattern, each document stores the _ids of its ancestor documents. This pattern allows for efficient updating of the tree.