MongoDB Revisited

In my previous article Introduction to MongoDB I discussed installing Mongo, its PHP extension, and how to perform simple insert and find operations. Of course there are many, many more features than what I mentioned so I wanted to write another article to show you some of them. In this article you’ll learn about cursors, additional query filters, and running queries on arrays and embedded documents.

Cursors

First let’s talk about cursors in MongoDB. In the earlier article you saw an example of a find operation like the one below, which selects all the documents found in a collection matching the passed criteria:

<?php
$cursor = $collection->find(array("author" => "shreef"));

What I only briefly mentioned at the time was the find() method returns a MongoCursor instance (not a list of the actual documents found). Nothing is requested from MongoDB until you call for a result from the cursor.

Mongo’s cursor has two life stages. The first stage is the “pre-query stage.” At this point, the cursor hasn’t tried to execute the query and you have a chance to add more details and constraints. For example, if you want to specify the maximum number of documents to be returned, you can use the cursor’s limit() method.

<?php
$cursor = $collection->find(array("author" => "shreef"));
$cursor = $cursor->limit(5);

Oftentimes you’ll see the method invocations chained together like so:

<?php
$cursor = $collection->find(array("author" => "shreef"))->limit(5);

The cursor actually performs the query and moves to its second stage, the “post-query stage,” once you try to read the results from the cursor either by calling the next() method directly or by iterating the cursor:

<?php
foreach ($cursor as $doc) {
    // do something
}

Also worth mentioning is that not all of the documents matching the criteria of your query will be returned at the same time. If the total size of the results is large, you probably wouldn’t want to load all that data into memory anyway. MonogDB has a limit of 4-16MB worth of returned results. When you’ve finished iterating through the first batch of results, the cursor will transparently retrieve the next batch of documents. All of this happens in the background for you so you don’t have to worry about it while writing your code, but it’s worth mentioning so you know what is actually happening.

By default, MongoDB will keep the cursor alive on the server until either you’ve finished reading all the results assigned to it, or 10 minutes have passed since its creation. You can use MongoCursor‘s timeout() method to increase or decrease the life of the cursor (in milliseconds). You can also pass -1 to timeout() to disable the timeout behavior, but then you’ll have to iterate over all of the results or else the cursor will live forever and exhaust the resources of the server.

Query Operators

Queries in MongoDB are simple to grasp after seeing only a few examples. You’ve seen that you can send the query as an array containing the values to be matched and you can fine-tune your matching criteria using the special $-operators supported by MongoDB. I’d like to show you the operators used for logical comparisons now, but first there’s one important note: always remember to use single quotes with the $-operators or escape them. You can probably guess why.

$lt, $lte, $gt, $gte

The $lt, $lte, $gt, and $gte operators are equivalent to <, <=, >, and >=. To find all the documents in a blog collection with a number of views greater than or equal to 50,000, you would construct a query like this:

<?php
$collection->find(array("views" => array('$gte' => 50000)));

views is the name of the field that should contain a value greater than or equal 50,000.

$and, $or, and $nor

Sometimes you’ll want to make sure a value fulfills more than one condition, or at lest one of several conditions. The $and and $or operators are used to provide Boolean conditions, the same as you are already used to. If you want to find all blog posts with a number of views greater or equal to 50,000 authored by either “Shreef” or “Timothy”, you would write a query like this:

<?php
$collection->find(array(
    "views" => array('$gte' => 50000),
    "$or" => array(
        array("author" => "Shreef"),
        array("author" => "Timothy"))));

The $nor operator is used similarly, but ensures that none of the conditions are met.

$in and $nin (not in)

The $in operator is useful when you want to pass a list of values that one of them should match the field you are checking. The $nin operator does the opposite, checking that the field doesn’t match any of the values. This can oftentimes be more readable than using the previously mentioned Boolean operators when you’re doing a simple query.

<?php
$collection->find(array(
    "authors" => array('$in' => array("Shreef", "Timothy"))));

Queries on Arrays

The previous examples demonstrated the ability to create and query fields containing a single value, but MongoDB supports array values as well. To provide a list of tags that organizes the blog posts, for example, you can simply specify them as an array.

<?php
$collection->insert(array(
    "title"  => "More Mongo",
    "author" => "Shreef",
    "tags"   => array("php", "mongodb")));

Now to find documents tagged with “php” you can do the following:

<?php
$collection->find(array("tags" => "php"));

Querying an array is the same as querying a field with a single value, and any array that lists “php” as one of its tag values will match. You can also use all the previously mentioned $-operators with arrays, plus the $all operator which allows you to check an array contains all of the the values passed.

<?php
$collection->find(array(
    "tags" => array('$all' => array("php", "mongodb")));

Now with $all, this query will only match documents with the tags “php” and “mongodb”. Having just one of these values won’t be enough to match.

Queries on Embedded Documents

Embedding documents is one of the things that you’ll probably deal with a lot if you’re using MongoDB for any serious application. For example, it might be logical to embed all of the comments on a post inside the same blog post document. Let’s assume Sophia added a new comment; you might update your blog document by pushing her comment to the comments array like so:

<?php
$postId = "xxx";
$collection->update(
    array("_id" => new Mongo($postId)),
    array('$push' => array(
        "comments" => array(
            "author"  => "Sophia",
            "content" => "hi..."))));

As you saw, I used an operator called $push which, from its name you can probably guess, pushes a new item onto an array. Performance-wise this approach is better than loading the entire document from the database, modifying it, and then writing it back to the database.

Now when you want to retrieve all of the comments made by Sophia, you can query for them like this:

<?php
$collection->find(array("comments.author" => "Sophia"));

You can write field names like this since MongoDB supports dot notation. Dot notation lets you write the names of fields as if they were object properties; by writing “comments.author” I can reference the value of the author field that exists in the comments object.

The sort() and skip() Methods

I’ve already mentioned the limit() method which accepts a count of documents to return when you do a query. MongoCursor offers other other methods that you’ll undoubtedly find useful, such as sort() and skip().

The sort() method is like the ORDER BY clause in SQL – you provide a number of fields to be used for sorting the results and specify how each is sorted. 1 represents ascending and -1 represents descending.

<?php
$collection
    ->find()
    ->sort(array("createdAt" => -1 , "author" => 1));

This will sort the matching documents by their creation date in descending order first, then by their author in ascending order.

The skip() method passes over the provided number of documents that match the query. For example:

<?php
$collection
    ->find(array("author" => "Shreef"))
    ->sort(array("createdAt" => -1))
    ->limit(5)
    ->skip(10);

The query searches for all documents authored by Shreef, ordered by their creation time, and then skips the first 10 documents that would have otherwise been returned to instead return the next 5 documents only.

Sorting documents leads to a very important point: indexing in MongoDB is just as important as in a RDBMS such as MySQL.

Indexes

Running queries without indexes doesn’t make much sense in any database. You have to create indexes on the fields that you’ll be referencing in your queries, including those you’ll use for sorting. You can create an index in MongoDB using the ensureIndex() method. The method accepts a list of fields as the first argument and an optional list of options as the second. Here’s an example:

<?php
$collection->ensureIndex(
    array("author" => 1), 
    array("name" => "idx_author"));

This will create an ascending index using the author field and I optionally chose to name the index “idx_author”. You can create an index with multiple fields by adding the names of the fields as keys to the array passed in the first argument and set their values o either 1 or -1. Using 1 means you want the indexing of the field to be ascending, while using -1 means you want it to be descending.

Other options you may need are the unique and dropDups options. You can create an index to ensure a field is unique across all documents in the collection by setting unique to “true”. If you set dropDups true, MongoDB will drop all duplicates except only one.

<?php
$collection->ensureIndex(
    array("title" => 1), 
    array("unique" => true, "dropDups" => true));

MongoDB tries to guess the best index to use when executing your query, but sometimes it fails to choose the right one. You use the hint() method to tell it about the fields to use.

<?php
$collection
    ->find(array("author" => "shreef"))
    ->hint(array("author" => 1));

In this example I told Mongo to use the index consisting of the author field that’s sorted in ascending order. You must pass the same criteria you used to create the index and in the same order. If there is no index consisting of the passed fields in the same order, an exception will be thrown.

You might be wondering why you can’t just use the name of the index instead. Actually, this is supported by Mongo, but it looks like it wasn’t implemented in the PHP API. The hint() method in the PHP API only accepts an array. Let’s hope this will be fixed soon!

Summary

MongoDB is getting better with every release and there are still many more features that I didn’t mention here. The PHP manual gives you some information, but it’s best to read the MongoDB documentation to learn about the latest and greatest features. This article is full of things that you can try, like using $-operators, querying embedded documents, and sorting and skipping results. Feel free to tinker with them and leave your questions and your findings in the comments.

Image via Pakhnyushcha / Shutterstock

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • http://bythescruff.com Kirill Galenko

    Thank you Ahmed,
    Your previous mongoDB post got me curious about it and this one convinced me to go and learn more about mongo. Will be using it in one of our upcoming projects as a result.