Using SPL Iterators, Part 1

This entry is part 1 of 2 in the series Using SPL Iterators

Using SPL Iterators

When I first came across the term iteration and saw the overwhelming list of classes related to it in the SPL, I was taken aback. It seemed maybe iteration was too complex for me to grasp. I soon realized it was just a fancy word for something we programmers do all the time.

If you use PHP, you’ve most likely used arrays. And if you’ve used arrays, then most definitely you’ve looped through its elements. Look through any code and almost certainly you’ll find a foreach loop. Yes, iteration is just the process of traversing a list of values. An iterator then is an object that traverses a list, be it an array, a directory listing, or even a database result set.

In the first part of this two-part series I’ll introduce you to iteration and how you can take advantage of some of the built-in classes from the Standard PHP Library (SPL). SPL comes with a large number of iterators, and using them in your code can make your code more efficient and in most cases, more readable.

Why and When to Use SPL Iterators

As you will see, iterating iterator objects is basically the same as iterating arrays, and so many people wonder if it wouldn’t be easier to just stick with using arrays in the first place. However, the real benefit of iterators show through when traversing a large amount of data or anything more complex than a simple array.

The foreach loop makes a copy of any array passed to it. If you are processing a large amount of data, having the large arrays copied each time you use them in a foreach loop might be undesirable. SPL iterators encapsulate the list and expose visibility to one element at a time making them far more efficient.

When creating data providers, iterators are a great construct as they allow you to lazy load your data. Lazy loading here is simply retrieving the required data only if and when it is needed. You can also manipulate (filter, transform etc) the data you are working on before giving it to the user.

The decision to use iterators is always at your discretion, however. Iterators have numerous benefits, but in some cases (as with smaller array sets) can cause unwanted overhead. The decision of when to use them rests with you; your choice of style, and their suitability in the given situation, are all factors you should consider.

Iterating Arrays

This first iterator I’d like to introduce you to is ArrayIterator. The constructor accepts an array for a parameter and provides methods that can be used to iterate through it. Here’s an example:

<?php
// an array (using PHP 5.4's new shorthand notation)
$arr = ["sitepoint", "phpmaster", "buildmobile", "rubysource",
    "designfestival", "cloudspring"];

// create a new ArrayIterator and pass in the array
$iter = new ArrayIterator($arr);

// loop through the object
foreach ($iter as $key => $value) {
    echo $key . ":  " . $value . "<br>";
}

The output of the above code is:

0: sitepoint
1: phpmaster
2: buildmobile
3: rubysource
4: designfestival
5: cloudspring

Usually, however, you will use ArrayObject, a class that allows you to work with objects as if they were arrays in certain contexts, instead of using ArrayIterator directly. This automatically creates an ArrayIterator for you when you use a foreach loop or call ArrayIterator::getIterator() directly.

Please note that while ArrayObject and ArrayIterator behave like arrays in this context, they are still objects; trying to use built-in array functions like sort() and array_keys() on them will fail dismally.

The use of ArrayIterator is straight forward, but limited to single dimensional arrays. Sometimes you’ll have a multidimensional array and you’ll want to iterate through the nested arrays recursively. In this case you can use RecursiveArrayIterator.

One common scenario is to nest foreach loops or to create a recursive function which checks all items of a multidimensional array. For example:

<?php
// a multidimensional array
$arr = [
    ["sitepoint", "phpmaster"],
    ["buildmobile", "rubysource"],
    ["designfestival", "cloudspring"],
    "not an array"
];

// loop through the object
foreach ($arr as $key => $value) {
    // check for arrays
    if (is_array($value)) {
        foreach ($value as $k => $v) {
            echo $k . ": " . $v . "<br>";
        }
    }
    else {
        echo $key . ": " . $value . "<br>";
    }
}

The output of the above code is:

0: sitepoint
1: phpmaster
0: buildmobile
1: rubysource
0: designfestival
1: cloudspring
3: not an array

A more elegant approach makes use of RecursiveArrayIterator.

<?php
...
$iter = new RecursiveArrayIterator($arr);

// loop through the object
// we need to create a RecursiveIteratorIterator instance
foreach(new RecursiveIteratorIterator($iter) as $key => $value) {
    echo $key . ": " . $value . "<br>";
}

The output is the same as the previous example.

Note that you need to create an instance of RecursiveIteratorIterator and pass it the RecursiveArrayIterator object here or else all you would get would be the values in the root array (and a ton of notices depending on your settings).

You should use RecursiveArrayIterator when dealing with multidimensional arrays as it allows you to iterate over the current entry as well, but leaves this up to you to do. RecursiveIteratorIterator is a decorator which does this for you. It takes the RecursiveArrayIterator, iterates over it and iterates over any Iterable entry it finds (and so on). Essentially, it “flattens” the RecursiveArrayIterator. You can get the current depth of iteration by calling RecursiveIteratorIterator::getDepth() to keep track. Be careful with RecursiveArrayIterator and RecursiveIteratorIterator though if you want to return objects; objects are treated as Iterable and will therefore be iterated.

Iterating Directory Listings

You will undoubtedly need to traverse a directory and its files at some point in time or another, and there are various ways of accomplishing this with the built-in functions provided by PHP already, such as with scandir() or glob(). But you can also use DirectoryIterator. In its simplest form, DirectoryIterator is quite powerful, but it can also be subclassed and enhanced.

Here’s an example of iterating a directory with DirectoryIterator:

<?php
// create new DirectoryIterator object
$dir = new DirectoryIterator("/my/directory/path");

// loop through the directory listing
foreach ($dir as $item) {
    echo $item . "<br>";
}

The output obviously will depend on the path you specify and what the directory’s contents are. For instance:

.
..
api
index.php
lib
workspace

Don’t forget that with DirectoryIterator, as well as many of the other SPL iterators, you have the added benefit of using exceptions to handle any errors.

<?php
try {
    $dir = new DirectoryIterator("/non/existent/path");
    foreach ($dir as $item) {
        echo $item . "<br>";
    }
}
catch (Exception $e) {
    echo get_class($e) . ": " . $e->getMessage();
}
UnexpectedValueException: DirectoryIterator::__construct(/non/existent/path,/non/existent/path): The system cannot find the file specified. (code: 2)

With a host of other methods like DirectoryIterator::isDot(), DirectoryIterator::getType() and DirectoryIterator::getSize(), pretty much all of your basic directory information needs are covered. You can even combine DirectoryIterator with FilterIterator or RegexIterator to return files matching specific criteria. For example:

<?php
class FileExtensionFilter extends FilterIterator
{
    // whitelist of file extensions
    protected $ext = ["php", "txt"];

    // an abstract method which must be implemented in subclass
    public function accept() {
        return in_array($this->getExtension(), $this->ext);
    }
}

//create a new iterator
$dir = new FileExtensionFilter(new DirectoryIterator("./"));
...

SPL also provides RecursiveDirectoryIterator which can be used in the same way as RecursiveArrayIterator. A function that traverses directories recursively will usually be littered with conditional checks for valid directories and files, and RecursiveDirectoryIterator can do much of the work for you resulting in cleaner code. There is one caveat, however. RecursiveDirectoryIterator does not return empty directories; if a directory contains many subdirectories but no files, it will return an empty result (much like how Git behaves).

<?php
// create new RecursiveDirectoryIterator object
$iter = new RecursiveDirectoryIterator("/my/directory/path");

// loop through the directory listing
// we need to create a RecursiveIteratorIterator instance
foreach (new RecursiveIteratorIterator($iter) as $item) {
    echo $item . "<br>";
}

My output resembles:

./api/.htaccess
./api/index.php
./index.php
...

Summary

Hopefully you now realize that iteration isn’t a complex beast like I first thought, and that it’s something we do every day as programmers. In this article I’ve introduced iteration and some of the classes that SPL provides to make iterating easier and more robust. Of course I’ve only dealt with a very small sampling of the available classes; SPL provides many, many more and I urge you to take a look at them.

SPL is a “standard” library. Sometimes you may find the classes too general and they may not always do what you need. In such cases you can easily extend the classes to add your own functionality or tweak existing functionality as needed. In the next part of this series I’ll show you how to use SPL interfaces to make your very own custom classes that can be traversed and accessed like arrays.

Image via Mushakesa / Shutterstock

Using SPL Iterators

Using SPL Iterators, Part 2 >>

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • Patrick

    Great article – informative, clear, and well written, unlike some of the articles on this site. Looking forward to part 2, but I’d be interested to see a performance analysis comparing the performance of regular foreach iteration to the use of these iterators. Obviously these iterators incur extra overhead because they involve instantiating additional objects, and it’d be nice to see if that is offset by the memory savings you make by avoiding foreach loops. That information would be helpful when making decisions about where to effectively deploy iterators to reduce system load.

    • http://careers.stackoverflow.com/frostymarvelous Stefan Froelich

      In my research, I came across a few benchmarks, though they weren’t too good.
      The difference between the two methods is sort of blurry. It will be a challenge (at least for me) to make those benchmarks. I just might do that for another article.

  • Paul

    Awesome, thank you. I’m looking forward to the next in the series….

  • http://php.net/spl Peter C

    > The foreach loop makes a copy of any array passed to it.

    This is not true at all, foreach will not copy the array. Of course, the usual rules apply if you modify the array from within the loop (i.e. the copy-on-write behaviour of all variables kicks in).

    > Usually, however, you will use ArrayObject, …

    You don’t mention the why that class is “usually” used, in preference to using ArrayIterator directly. A sentence or two on nested loops over ArrayIterator would have been nice.

    > Be careful with RecursiveArrayIterator and RecursiveIteratorIterator though if you want to return objects; objects are treated as Iterable and will therefore be iterated.

    That doesn’t do a very good job of explaining that the (Recursive)ArrayIterator will iterate over an object’s properties too, regardless of the name of the iterator.

    > Iterating Directory Listings

    It would have been preferable to use the FilesystemIterator class, rather than the quirky DirectoryIterator which behaves differently to the usual expectations from an iterator; e.g. the “current” value is always the DirectoryIterator itself.

    > There is one caveat, however. RecursiveDirectoryIterator does not return empty directories

    Only because you instruct the RecursiveIteratorIterator to only iterate over “leaf” nodes (put simply, files). The loop will include (all) directories if you use one of the RecursiveIteratorIterator::SELF_FIRST/CHILD_FIRST flags.

    • http://careers.stackoverflow.com/frostymarvelous Stefan Froelich

      Thanks for such a comprehensive analysis. As we all know, we are not perfect in our knowledge and it’s really great when someone points out errors.

      >> The foreach loop makes a copy of any array passed to it.
      This is not true at all, foreach will not copy the array. Of course, the usual rules apply if you modify the array from within the loop (i.e. the copy-on-write behaviour of all variables kicks in).

      You are partially right here. Some more info at http://nikic.github.com/2011/11/11/PHP-Internals-When-does-foreach-copy.html

      >> Usually, however, you will use ArrayObject, …
      >You don’t mention the why that class is “usually” used, in preference to using ArrayIterator directly. A sentence or two on nested loops over ArrayIterator would have been nice.
      >> Be careful with RecursiveArrayIterator and RecursiveIteratorIterator though if you want to return objects; objects are treated as Iterable and will therefore be iterated.
      >That doesn’t do a very good job of explaining that the (Recursive)ArrayIterator will iterate over an object’s properties too, regardless of the name of the iterator.

      True again. Ill make a note of these points and make appropriate improvements.

      —-
      For your other points, the article is just an introduction, and I didn’t want to go too deep into the technicalities.

      I should say I appreciate your criticism and suggestion. This is the only way we as a community can grow in our knowlege. Kudos.

      • http://zimzat.com/ Zimzat

        Reading through the rules of when an array is copied using foreach, it’s worth noting the distinction of what is and isn’t being copied. The contents of the array is never copied unless modified, whereas the structure of the array is. What is being copied is the few bytes for each array element’s container and a few more bytes for an internal reference of the original array contents value (a 3MB string would still only be a few bytes copied in this case). It would take an array of hundreds of thousands of elements to make a significant impact on the memory usage.

        Which is another thing worth mentioning when it comes to iterators: They often exchange decreased memory usage for increased CPU usage. This is most notably true of the Iterator interface because it makes four calls into PHP code (rather than native code) for each element it goes over (rewind/next, valid, current, key). They’re relatively insignificant on smaller data sets, but the time adds up when going over the larger ones.

        I really like iterators for many of the native usages, like iterating directories (I’ve surprised an interviewer by using them). They’re also useful in many of the ways you’ve outlined, like lazy loading and applying logic against data sets before returning the next valid value.

  • uestla

    Hi, thanks for the article.

    I think there’s a typo in the paragraph after the first example:

    ‘or call ArrayObject::getIterator() directly’