MongoDB Indexing, Part 2

Share this article

In part 1 of this series we had an introduction to indexing in MongoDB. we saw how to create, use, and analyze queries with indexes giving us a good foundation to build on. In this part, we’ll take a look at a few more small but important concepts, like indexing on sub-documents and embedded fields, covered queries, and index direction.

Of course, this part assumes that you know how to create an index on a and use the explain() method to analyze it. If you don’t already know how, I suggest you go back and read part one before continuing here.

We used a collection named posts in the last article. For our work here, let’s add a new field location to it to store the location from which the post was made. The field is a sub-document, and stores the city, state, and country of the user as shown below (a sub-document is a field having a document structure):

{
    "_id": ObjectId("5146bb52d852470060001f4"),
    "comments": {
        "0": "This is the first comment",
        "1": "This is the second comment"
    },
    "post_likes": 40,
    "post_tags": {
        "0": "MongoDB",
        "1": "Tutorial",
        "2": "Indexing"
    },
    "post_text": "Hello Readers!! This is my post text",
    "post_type": "private",
    "user_name": "Mark Anthony",
    "location": {
        "city": "Los Angeles",
        "state": "California",
        "country": "USA"
    }
}

Indexing on Sub-documents

Suppose we want to search posts based on where the user lives. For this, we need to create an index on the sub-document location field, which in turn indexes the sub-fields. Then we’ll be able to use the index for the following kinds of queries:

<?php
// query to find posts from the city of Los Angeles
$cursor = $collection->find(
    array(
        "location" => "Los Angeles"
    ),
    array()
);

// query to find posts from the state of California
$cursor = $collection->find(
    array(
        "location" => "California"
    ),
    array()
);

// query to find posts from the United States
$cursor = $collection->find(
    array(
        "location" => "USA"
    ),
    array()
);

We’re able to search all of the sub-fields (city, state, and country) in the sub-document using only location as the key. The query looks to see if any of the sub-fields of location meet our search criteria.

It should be noted that, similar to indexing on arrays, separate indexes are created for all the of the sub-fields internally. In this case, three indexes are created as location.city, location.state and location.country, hence such indexes should be used with care since each index occupies space in memory.

Indexing on Embedded Fields

It will happen sometimes that we won’t need indexes on all of the fields of a sub-document. If in our application we only want to find posts based on city but not state or country, we can create the index on the embedded field city.

mongo-2-1

We can now use this index in queries to find posts based on city:

<?php
// query to find posts from the city of Los Angeles
$cursor = $collection->find(
    array(
        "location.city" => "Los Angeles"
    ),
    array()
);

Index Direction (Ascending/Descending)

We’ve always provide an index direction (1 or -1) to keys when creating our indexes. I touched on this briefly in part 1, but this is actually an important discussion point I’d like like to pick up again. If we have one key in the index, direction 1 or -1 doesn’t really matter, but it comes into play when we do sorting or ranged queries with compound indexes.

Suppose we have a compound index with key field1 ascending and key field2 descending. In this case, the indexing table may look like this:

mongo-2-2

A query with sorting on field1 ascending and field2 ascending will travel rows in this order: 1, 2, 3, 4, 5, 6, 7, 8, 9. A query with field1 ascending and field2 descending will travel: 3, 2, 1, 6, 5, 4, 9, 8, 7. Such out-of-order jumps in the search tree can end up being costly to query performance.

Of course index structure above is represented as a table just for the purposes of understanding. Remember, MongoDB uses tree structures internally; each element is stored as a node of a tree. The elements closer to each other would be under the same branches, and hence easily approachable. If a query has to retrieve multiple records in a sorted manner, it would be logically correct to place the elements near each other in the tree for faster retrieval in comparison to the case where the query has to jump from one node to another far node to grab the elements.

If you are looking to sort on field1:1,field2:1, then index {field1:1, field2:1} would be faster than {field1:1, field2:-1} or {field1:-1, field2:1}.

Covered Queries

As per MongoDB’s documentation, a covered query is the one in which:

  • all fields used in the query are part of an index used in the query, and
  • all the fields returned in the results are in the same index

Since all the fields are covered in the index itself, MongoDB can match the query condition as well as return the result fields using the same index without looking inside the documents. Since indexes are stored in RAM or sequentially located on disk, such access is a lot faster.

Consider we have a compound index defined on the post_type and user_name fields. This index covers the following query:

<?php
// query to find posts with type public and get only user_name in result
$cursor = $collection->find(
    array(
        "post_type" => "public",
    ),
    array(
        "user_name" => 1,
        "_id" => 0
    )
);

We’ve explicitly excluded the _id field from the result to take advantage of the covered query. As you may already know, all queries return the _id field by default. As per the second condition for covered queries, all the fields returned in the result should be included in the index. We don’t have _id in our compound index on post_type and user_name, so we have to exclude this field from the result.

To check if the query is covered, we can look to the indexOnly field in the result of the explain() method. A true value of indicates that ours was a covered query.

mongo-2-3

It’s important to know that an index can’t cover a query if:

  • any of the indexed fields is an array (e.g. post_tags), or
  • any of the indexed fields are fields in sub-documents (e.g. location.city)

Thus, it’s always a good practice to check your query index usage with explain().

Removing Indexes

To check the current index size for a database, we can use the totalIndexSize() method which returns the index size in bytes.

mongo-2-4

We just have to ensure that we have enough RAM available to accommodate indexes as well as the data that MongoDB manages and uses regularly.

To delete an existing index, and thus free up resources, we use the dropIndex() method.

mongo2-5

Conclusion

That’s all for this part and also the series. We’ve touched on a lot of important topics to get you up to speed with indexing in MongoDB.

Analyzing your indexes to make sure they are doing well is an on-going process as your application grows and your data changes, so if you have any kind questions or comments about the article, feel free to post in the comments below.

Image via Fotolia

Ashish TrivediAshish Trivedi
View Author

Ashish is a Computer Science Engineer and works for one of the Big Four firms as a Technology Analyst. Prior to this, he co-founded an educational start-up. In his free time, Ashish read and write blogs, watches animation cartoons, movies or discovery channel. You can find his recent blogs on http://ashishtrivediblog.wordpress.com/

Advanced
Share this article
Read Next
Get the freshest news and resources for developers, designers and digital creators in your inbox each week