PHP
Article

Memory Performance Boosts with Generators and Nikic/Iter

By Christopher Pitt

Arrays, and by extension iteration, are fundamental parts to any application. And like the complexity of our applications, how we use them should evolve as we gain access to new tools.

New tools, like generators, for instance. First came arrays. Then we gained the ability to define our own array-like things (called iterators). But since PHP 5.5, we can rapidly create iterator-like structures called generators.

A loop illustration

These appear as functions, but we can use them as iterators. They give us a simple syntax for what are essentially interruptible, repeatable functions. They’re wonderful!

And we’re going to look at a few areas in which we can use them. We’re also going to discover a few problems to be aware of when using them. Finally, we’ll study a brilliant library, created by the talented Nikita Popov.

You can find the example code at https://github.com/sitepoint-editors/generators-and-iter.

The Problems

Imagine you have lots of relational data, and you want to do some eager loading. Perhaps the data is comma-separated, and you need to load each data type, and knit them together.

You could start with something as simple as:

function readCSV($file) {
    $rows = [];

    $handle = fopen($file, "r");

    while (!feof($handle)) {
        $rows[] = fgetcsv($handle);
    }

    fclose($handle);

    return $rows;
}

$authors = array_filter(
    readCSV("authors.csv")
);

$categories = array_filter(
    readCSV("categories.csv")
);

$posts = array_filter(
    readCSV("posts.csv")
);

Then you’d probably try to connect related elements through iteration or higher-order functions:

function filterByColumn($array, $column, $value) {
    return array_filter(
        $array, function($item) use ($column, $value) {
            return $item[$column] == $value;
        }
    );
}

$authors = array_map(function($author) use ($posts) {
    $author["posts"] = filterByColumn(
        $posts, 1, $author[0]
    );

    // make other changes to $author

    return $author;
}, $authors);

$categories = array_map(function($category) use ($posts) {
    $category["posts"] = filterByColumn(
        $posts, 2, $category[0]
    );

    // make other changes to $category

    return $category;
}, $categories);

$posts = array_map(function($post) use ($authors, $categories) {
    foreach ($authors as $author) {
        if ($author[0] == $post[1]) {
            $post["author"] = $author;
            break;
        }
    }

    foreach ($categories as $category) {
        if ($category[0] == $post[1]) {
            $post["category"] = $category;
            break;
        }
    }

    // make other changes to $post

    return $post;
}, $posts);

Seems ok, right? Well, what happens when we have huge CSV files to parse? Let’s profile the memory usage a bit…

function formatBytes($bytes, $precision = 2) {
    $kilobyte = 1024;
    $megabyte = 1024 * 1024;

    if ($bytes >= 0 && $bytes < $kilobyte) {
        return $bytes . " b";
    }

    if ($bytes >= $kilobyte && $bytes < $megabyte) {
        return round($bytes / $kilobyte, $precision) . " kb";
    }

    return round($bytes / $megabyte, $precision) . " mb";
}

print "memory:" . formatBytes(memory_get_peak_usage());

The example code includes generate.php, which you can use to make these CSV files…

If you have large CSV files, this code should show just how much memory if takes to link these arrays together. It’s at least the size of the file you have to read, because PHP has to hold it all in memory.

Generators to the Rescue!

One way you could improve this would be to use generators. If you’re unfamiliar with them, now is a good time to learn more.

Generators will allow you to load tiny amounts of the total data at once. There’s not much you need to do to use generators:

function readCSVGenerator($file) {
    $handle = fopen($file, "r");

    while (!feof($handle)) {
        yield fgetcsv($handle);
    }

    fclose($handle);
}

If you loop over the CSV data, you’ll notice an immediate drop in the amount of memory you need at once:

foreach (readCSVGenerator("posts.csv") as $post) {
    // do something with $post
}

print "memory:" . formatBytes(memory_get_peak_usage());

If you were seeing megabytes of memory used before, you’ll see kilobytes now. That’s a huge improvement, but it doesn’t come without its share of problems.

For a start, array_filter and array_map don’t work with generators. You’ll have to find other tools to handle that kind of data. Here’s one you can try!

composer require nikic/iter

This library introduces a few functions that work with iterators and generators. So how could you still get all this relatable data, without keeping any of it in memory?

function getAuthors() {
    $authors = readCSVGenerator("authors.csv");

    foreach ($authors as $author) {
        yield formatAuthor($author);
    }
}

function formatAuthor($author) {
    $author["posts"] = getPostsForAuthor($author);

    // make other changes to $author

    return $author;
}

function getPostsForAuthor($author) {
    $posts = readCSVGenerator("posts.csv");

    foreach ($posts as $post) {
        if ($post[1] == $author[0]) {
            yield formatPost($post);
        }
    }
}

function formatPost($post) {
    foreach (getAuthors() as $author) {
        if ($post[1] == $author[0]) {
            $post["author"] = $author;
            break;
        }
    }

    foreach (getCategories() as $category) {
        if ($post[2] == $category[0]) {
            $post["category"] = $category;
            break;
        }
    }

    // make other changes to $post

    return $post;
}

function getCategories() {
    $categories = readCSVGenerator("categories.csv");

    foreach ($categories as $category) {
        yield formatCategory($category);
    }
}

function formatCategory($category) {
    $category["posts"] = getPostsForCategory($category);

    // make other changes to $category

    return $category;
}

function getPostsForCategory($category) {
    $posts = readCSVGenerator("posts.csv");

    foreach ($posts as $post) {
        if ($post[2] == $category[0]) {
            yield formatPost($post);
        }
    }
}

// testing this out...

foreach (getAuthors() as $author) {
    foreach ($author["posts"] as $post) {
        var_dump($post["author"]);
        break 2;
    }
}

This could be less verbose:

function filterGenerator($generator, $column, $value) {
    return iter\filter(
        function($item) use ($column, $value) {
            return $item[$column] == $value;
        },
        $generator
    );
}

function getAuthors() {
    return iter\map(
        "formatAuthor",
        readCSVGenerator("authors.csv")
    );
}

function formatAuthor($author) {
    $author["posts"] = getPostsForAuthor($author);

    // make other changes to $author

    return $author;
}

function getPostsForAuthor($author) {
    return iter\map(
        "formatPost",
        filterGenerator(
            readCSVGenerator("posts.csv"), 1, $author[0]
        )
    );
}

function formatPost($post) {
    foreach (getAuthors() as $author) {
        if ($post[1] == $author[0]) {
            $post["author"] = $author;
            break;
        }
    }

    foreach (getCategories() as $category) {
        if ($post[2] == $category[0]) {
            $post["category"] = $category;
            break;
        }
    }

    // make other changes to $post

    return $post;
}

function getCategories() {
    return iter\map(
        "formatCategory",
        readCSVGenerator("categories.csv")
    );
}

function formatCategory($category) {
    $category["posts"] = getPostsForCategory($category);

    // make other changes to $category

    return $category;
}

function getPostsForCategory($category) {
    return iter\map(
        "formatPost",
        filterGenerator(
            readCSVGenerator("posts.csv"), 2, $category[0]
        )
    );
}

It’s a bit wasteful to re-read each data source, every time. Consider keeping smaller related data (like authors and categories) in memory…

Other Fun Things

That’s just the tip of the iceberg when it comes to Nikic’s library! Ever wanted to flatten an array (or iterator/generator)?

$array = iter\toArray(
    iter\flatten(
        [1, 2, [3, 4, 5], 6, 7]
    )
);

print join(", ", $array); // "1, 2, 3, 4, 5"

You can return slices of iterable variables, using functions like slice and take:

$array = iter\toArray(
    iter\slice(
        [-3, -2, -1, 0, 1, 2, 3],
        2, 4
    )
);

print join(", ", $array); // "-1, 0, 1, 2"

As you work more with generators, you may come to find that you can’t always reuse them. Consider the following example:

$mapper = iter\map(
    function($item) {
        return $item * 2;
    },
    [1, 2, 3]
);

print join(", ", iter\toArray($mapper));
print join(", ", iter\toArray($mapper));

If you try to run that code, you’ll see an exception saying; “Cannot traverse an already closed generator”. Each iterator function in this library has a rewindable counterpart:

$mapper = iter\rewindable\map(
    function($item) {
        return $item * 2;
    },
    [1, 2, 3]
);

You can use this mapping function many times. You can even make your own generators rewindable:

$rewindable = iter\makeRewindable(function($max = 13) {
    $older = 0;
    $newer = 1;

    do {
        $number = $newer + $older;

        $older = $newer;
        $newer = $number;

        yield $number;
    }
    while($number < $max);
});

print join(", ", iter\toArray($rewindable()));

What you get from this is a reusable generator!

Conclusion

For every looping thing you need to think about, generators may be an option. They can even be useful for other things ,too. And where the language falls short, Nikic’s library steps in with higher-order functions aplenty.

Are you using generators yet? Would you like to see more examples on how to implement them in your own apps to gain some performance upgrades? Let us know!

Meet the author
Christopher is a writer and coder, working at SilverStripe. He usually works on application architecture, though sometimes you'll find him building compilers or robots.
  • http://Post20.com theresa_lee7

    This is something very interesting that is worth paying your extreme attention ,a very good chance to work for those people who want to use their free time so that they can make some extra money using their computers… I have been working on this for last two and half years and I am earning 60-90 dollar/ hour … In the past week I have earned 13,70 dollars for almost 20 hours sitting ….

    Any special qualification, degree or skills is not necessary for this, just keyboard typing and a good working and reliable internet connection ….

    Not any Time limitations to start work … You may do this work at any time when you willing to do it ….

    Just know how I have been doing this…..….see this (webiste-Iink) on my !profile!` to know how I am working` on this`

    )9o

  • Radek Dvořák

    Hi Christopher,

    thanks for the article. I have always wondered in which situations generators are useful. I have to admit I still do not get it. Loading (huge) data in chunks has already been there before generators arrived in PHP. We can use for/while loops. Furthermore when I looked at the code couple of things caught my eye.

    1) Why should I want to nest multiple iterations – eg. array_filter in array_map? In the example with posts and categories every post is visited for each and every category (in the worst case). Why not index all the data in a sensible way and access them by their index? I think that would be faster and easier to read (subjective).

    2) Please fix the memory units. Consumed memory is measured neither in bits, kilobits or even milibits :-)

    • Daniel Jurkovic

      Generators make your code smarter, cleaner and more memory efficient when you need oneway iteration over computed collections or memory consuming datasets. For example try to implement an iterator for prime numbers. Using generators you end up writing less, more memory efficient code.

    • Frode

      To me it seems there will be several situations where I will use generators so make my code more readable. I’ve never seen a situation where I could not solve the problem without generators.

      The first thing I will use generators for, is my api client. Many apis use paged results, where I only get for example ten rows per request. With generators, I can hide the logic of fetching the next page inside the generator. There are several places in my application which now will become much simpler.

  • http://Post20.com Joyce Santos

    This is something very interesting that is worth paying your extreme attention ,a very good chance to work for those people who want to use their free time so that they can make some extra money using their computers… I have been working on this for last two and half years and I am earning 60-90 dollar/ hour … In the past week I have earned 13,70 dollars for almost 20 hours sitting ….

    Any special qualification, degree or skills is not necessary for this, just keyboard typing and a good working and reliable internet connection ….

    Not any Time limitations to start work … You may do this work at any time when you willing to do it ….

    Just know how I have been doing this…..….see this (webiste-Iink) on my !profile!` to know how I am working` on this`

    ^&Z8hZ&*

  • http://www.christoph-rumpel.com/ Christoph Rumpel

    Thx Christopher’
    Little typo : …memory if takes… (It)

Recommended

Learn Coding Online
Learn Web Development

Start learning web development and design for free with SitePoint Premium!

Get the latest in PHP, once a week, for free.