In part 1 of this series we had an introduction to indexing in MongoDB. we saw how to create, use, and analyze queries with indexes giving us a good foundation to build on. In this part, we’ll take a look at a few more small but important concepts, like indexing on sub-documents and embedded fields, covered queries, and index direction.
Of course, this part assumes that you know how to create an index on a and use the explain()
method to analyze it. If you don’t already know how, I suggest you go back and read part one before continuing here.
We used a collection named posts
in the last article. For our work here, let’s add a new field location
to it to store the location from which the post was made. The field is a sub-document, and stores the city, state, and country of the user as shown below (a sub-document is a field having a document structure):
{
"_id": ObjectId("5146bb52d852470060001f4"),
"comments": {
"0": "This is the first comment",
"1": "This is the second comment"
},
"post_likes": 40,
"post_tags": {
"0": "MongoDB",
"1": "Tutorial",
"2": "Indexing"
},
"post_text": "Hello Readers!! This is my post text",
"post_type": "private",
"user_name": "Mark Anthony",
"location": {
"city": "Los Angeles",
"state": "California",
"country": "USA"
}
}
Indexing on Sub-documents
Suppose we want to search posts based on where the user lives. For this, we need to create an index on the sub-document location
field, which in turn indexes the sub-fields. Then we’ll be able to use the index for the following kinds of queries:
<?php
// query to find posts from the city of Los Angeles
$cursor = $collection->find(
array(
"location" => "Los Angeles"
),
array()
);
// query to find posts from the state of California
$cursor = $collection->find(
array(
"location" => "California"
),
array()
);
// query to find posts from the United States
$cursor = $collection->find(
array(
"location" => "USA"
),
array()
);
We’re able to search all of the sub-fields (city
, state
, and country
) in the sub-document using only location
as the key. The query looks to see if any of the sub-fields of location
meet our search criteria.
It should be noted that, similar to indexing on arrays, separate indexes are created for all the of the sub-fields internally. In this case, three indexes are created as location.city
, location.state
and location.country
, hence such indexes should be used with care since each index occupies space in memory.
Indexing on Embedded Fields
It will happen sometimes that we won’t need indexes on all of the fields of a sub-document. If in our application we only want to find posts based on city but not state or country, we can create the index on the embedded field city
.
We can now use this index in queries to find posts based on city:
<?php
// query to find posts from the city of Los Angeles
$cursor = $collection->find(
array(
"location.city" => "Los Angeles"
),
array()
);
Index Direction (Ascending/Descending)
We’ve always provide an index direction (1 or -1) to keys when creating our indexes. I touched on this briefly in part 1, but this is actually an important discussion point I’d like like to pick up again. If we have one key in the index, direction 1 or -1 doesn’t really matter, but it comes into play when we do sorting or ranged queries with compound indexes.
Suppose we have a compound index with key field1
ascending and key field2
descending. In this case, the indexing table may look like this:
A query with sorting on field1
ascending and field2
ascending will travel rows in this order: 1, 2, 3, 4, 5, 6, 7, 8, 9. A query with field1
ascending and field2
descending will travel: 3, 2, 1, 6, 5, 4, 9, 8, 7. Such out-of-order jumps in the search tree can end up being costly to query performance.
Of course index structure above is represented as a table just for the purposes of understanding. Remember, MongoDB uses tree structures internally; each element is stored as a node of a tree. The elements closer to each other would be under the same branches, and hence easily approachable. If a query has to retrieve multiple records in a sorted manner, it would be logically correct to place the elements near each other in the tree for faster retrieval in comparison to the case where the query has to jump from one node to another far node to grab the elements.
If you are looking to sort on field1:1,field2:1
, then index {field1:1, field2:1}
would be faster than {field1:1, field2:-1}
or {field1:-1, field2:1}
.
Covered Queries
As per MongoDB’s documentation, a covered query is the one in which:
- all fields used in the query are part of an index used in the query, and
- all the fields returned in the results are in the same index
Since all the fields are covered in the index itself, MongoDB can match the query condition as well as return the result fields using the same index without looking inside the documents. Since indexes are stored in RAM or sequentially located on disk, such access is a lot faster.
Consider we have a compound index defined on the post_type
and user_name
fields. This index covers the following query:
<?php
// query to find posts with type public and get only user_name in result
$cursor = $collection->find(
array(
"post_type" => "public",
),
array(
"user_name" => 1,
"_id" => 0
)
);
We’ve explicitly excluded the _id
field from the result to take advantage of the covered query. As you may already know, all queries return the _id
field by default. As per the second condition for covered queries, all the fields returned in the result should be included in the index. We don’t have _id
in our compound index on post_type
and user_name
, so we have to exclude this field from the result.
To check if the query is covered, we can look to the indexOnly
field in the result of the explain()
method. A true value of indicates that ours was a covered query.
It’s important to know that an index can’t cover a query if:
- any of the indexed fields is an array (e.g.
post_tags
), or - any of the indexed fields are fields in sub-documents (e.g.
location.city
)
Thus, it’s always a good practice to check your query index usage with explain()
.
Removing Indexes
To check the current index size for a database, we can use the totalIndexSize()
method which returns the index size in bytes.
We just have to ensure that we have enough RAM available to accommodate indexes as well as the data that MongoDB manages and uses regularly.
To delete an existing index, and thus free up resources, we use the dropIndex()
method.
Conclusion
That’s all for this part and also the series. We’ve touched on a lot of important topics to get you up to speed with indexing in MongoDB.
Analyzing your indexes to make sure they are doing well is an on-going process as your application grows and your data changes, so if you have any kind questions or comments about the article, feel free to post in the comments below.
Image via Fotolia
Ashish is a Computer Science Engineer and works for one of the Big Four firms as a Technology Analyst. Prior to this, he co-founded an educational start-up. In his free time, Ashish read and write blogs, watches animation cartoons, movies or discovery channel. You can find his recent blogs on http://ashishtrivediblog.wordpress.com/