Ruby
Article
By Benjamin Tan Wei Hao

Implementing Lazy Enumerables in Ruby

By Benjamin Tan Wei Hao

This post is an excerpt from The Ruby Closures Book, written by Benjamin himself. If you like this post (and you will), tell us what you enjoyed at the end to be entered into a draw for a free copy! Now…read on…

rubyclosures

I’ve always been fascinated by Ruby’s lazy enumeration, a feature that was introduced in Ruby 2.0. In this article, we take a deep dive into lazy enumeration, learning how Ruby implements this interesting programming technique by making your own lazy enumerable.

What exactly is lazy? Is Ruby trying to slack off? Being lazy refers to the style of evaluation. To let this sink in, consider the opposite of lazy evaluation: eager evaluation.

There really isn’t much to talk about regarding eager evaluation, as it’s the standard way Ruby works. But sometimes, being eager is a bad thing. For instance, what do you think this evaluates to:

irb> 1.upto(Float::INFINITY).map { |x| x * x }.take(10)

You might expect the result to be [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]. Unfortunately, the Ruby interpreter doesn’t know when to stop. The problem here is 1.upto(Float::INFINITY). 1.upto(Float::INFINITY) represents an infinite sequence. How does it look like in the console?

irb> 1.upto(Float::INFINITY)
=> #<Enumerator: 1:upto(Infinity)>

No surprise here, an enumerator is returned. Let’s try to force values out from the enumerator using Enumerator#to_a:

irb> 1.upto(5).to_a
=> [1, 2, 3, 4, 5]

To infinity and beyond!:

irb> 1.upto(Float::INFINITY).to_a

You shouldn’t be surprised by now that this will lead to an infinite loop. Notice that Enumerator#to_a is useful to “force” values out of an enumerator. In fact, Enumerator#to_a is aliased to Enumerator#force. This is useful, as you will see later on, when you want to know all the values produced by an enumerator.

Bringing on the Lazy!

How do we get values from an infinite list then? Introducing Enumerable#lazy:

irb> 1.upto(Float::INFINITY).lazy.map { |x| x * x }
=> #<Enumerator::Lazy: #<Enumerator::Lazy: #<Enumerator: 1:upto(Infinity)>>:map>

It appears that the 1.upto(Float::INFINITY) enumerator has been made lazy by wrapping it with an Enumerator::Lazy. This lazy enumerator itself wraps around a lazy version of map. Let’s try the very first expression again, but this time with Enumerable#lazy:

irb> 1.upto(Float::INFINITY).lazy.map { |x| x * x }.take(10)
=> #<Enumerator::Lazy: #<Enumerator::Lazy: #<Enumerator::Lazy: #<Enumerator: 1:upto(Infinity)>>:map>:take(10)>

Ah, Enumerable#take is also wrapped up! How do we get all the values? Enumerable#to_a to the rescue:

irb> 1.upto(Float::INFINITY).lazy.map { |x| x * x }.take(10).to_a
=> [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

How does this magic work? We are about to find out. We will implement our own version of lazy enumerables, albeit a minimal version. Along the way, you will also learn interesting aspects of Ruby’s enumerators.

Building the Skeleton

The first order of things is to decide where our implementation should live. To do that, we just have to find out where Enumerator::Lazy lives. If we head over to the official documentation, there’s a clue:

So, Enumerator is the parent of Lazy. In code:

class Lazy < Enumerator
end

Instead of re-opening Ruby classes like that (I get involuntary twitches), for our little exercise we are going to invent another name. A quick trip to the thesaurus yields something similar to Lazy. Introducing, Lax!:

class Lax < Enumerator
end

External VS Internal Iteration

What does inheriting from Enumerator buy us? To answer that question, we need a refresher on what an Enumerator is. From the docs:

A class which allows both internal and external iteration.

What is the difference between the two flavors of iteration? The key lies with who controls the iteration.

For internal iteration, it is the Array object (or any Enumerable) that controls the iteration. In fact, that’s how you normally interact with Enumerables.

External iteration on the other hand, is controlled by some other object wrapped around an Enumerable.

Why would you want external iterators in the first place? Well, sometimes you do not want to iterate through all of the elements in one go. You want to be able to tell the enumerable, “give me exactly one now, and when I need the next one, I will ask again”. External iterators give you that ability.

Creating an Enumerator from an Enumerable

I mentioned that an Enumerator wraps an Enumerable. We can see that in code:

irb> e = Enumerator.new([1,2,3])
irb> e.next
=> 1
irb> e.next
=> 2
irb> e.next
=> 3
irb> e.next
StopIteration: iteration reached an end
      from (irb):7:in `next'

When we wrap an array with an enumerator, we can call Enumerator#next multiple times to retrieve the next value. When there are no more values left, the StopIteration exception is raised.

Notice that Ruby complains about using Object#to_enum instead, or using it with a block. Let’s use a block, because that is more interesting:

e = Enumerator.new do |yielder|
  [1,2,3].each do |val|
    yielder << val
  end
end

Let’s see the code in action in irb:

irb(main):008:0> e = Enumerator.new do |yielder|
irb(main):009:1*   [1,2,3].each do |val|
irb(main):010:2*     yielder << val
irb(main):011:2>   end
irb(main):012:1> end
=> #<Enumerator: #<Enumerator::Generator:0x007fb9798e0668>:each>

As usual, we can call Enumerator#next:

irb(main):013:0> e.next
=> 1
irb(main):014:0> e.next
=> 2
irb(main):015:0> e.next
=> 3
irb(main):016:0> e.next
StopIteration: iteration reached an end
        from (irb):16:in `next'
        from (irb):16
        from /Users/benjamintan/.rbenv/versions/2.2.0/bin/irb:11:in `<main>'

Let’s look at the code again, because there’s more than meets the eye. There are a few questions that come up:

e = Enumerator.new do |yielder|
  [1,2,3].each do |val|
    yielder << val
  end
end
  1. What is this yielder object that is passed into the block?
  2. What does yielder do?
  3. Most importantly, how is it possible that simply wrapping an Enumerable enables the retrieval of the elements one by one?

We can answer the first two questions. The yielder object is passed into the block when you create an Enumerator object using with a block.

The purpose of the yielder object is to store the instructions for the next yield. Not the value, but the instructions. That’s what yielder does. The is aliased to the yield method, but it has nothing to do with the yield keyword. Yes, it is confusing.

When Enumerator#next is called, it returns the next value and returns. This suggests that the yielder object must remember some form of state. How is this achieved? The return value from the previous code listing gives us a clue:

#<Enumerator: #<Enumerator::Generator:0x007fb9798e0668>:each>

This tells us that an Enumerator object contains another object called Enumerator ::Generator.

--ADVERTISEMENT--

Generators, Fibers, Oh My!

Let’s take a detour and explore Enumerator::Generator. Generators are able to convert an internal iterator, such as [1,2,3], into an external one. It is the secret sauce which allows the one-by-one retrieval of the elements.

Here is how generators work:

  1. First, it computes some result
  2. This result is handed back to the caller
  3. In addition, it also saves the state of the computation so that the caller can resume that computation to generate the next result

One way to do that is to use a little known Ruby construct called a fiber. The Fiber class is perfect of converting an internal iterator to an external one.

I’ve not used fibers much myself, but nonetheless, they are pretty fun to explore. Let’s do a quick run through of the basics.

A fiber is created with Fiber.new, along with a block that represents the computation. Let’s look at this example:

f = Fiber.new do
  x = 0
  loop do
    Fiber.yield x
          x += 1
  end
end

We create a fiber, and give it a block of computation. This block contains a loop that doesn’t end. What is this Fiber.yield? That is the secret sauce! We will get to that in a second.

When you create a fiber like in the above example, the block isn’t executed immediately. So, how do you execute the block? With Fiber#resume. Observe:

irb> f = Fiber.new do
irb*   x = 0
irb>   loop do
irb*     Fiber.yield x
irb>     x += 1
irb>   end
irb> end
=> #<Fiber:0x007fb979023e58>
irb> f.resume
=> 0
irb> f.resume
=> 1
irb> f.resume
=> 2

Now back to the secret sauce. What you have just created is an infinite number generator. The reason the loop doesn’t run indefinitely is because of the behavior of Fiber.yield. First of all, this has no relation to the yield keyword, as you know it.

The code executes Fiber.yield x when Fiber#resume is called, the value is returned to the caller, and control is given back to the caller. Until the next time Fiber#resume is called, then x is incremented, the loop goes for another round, executes Fiber.yield x again, and gives control back to the caller.

Implementing Lax, Our Very Own Enumerator::Lazy

Now that we understand how generators work, let’s head back to the Enumerator::Lax implementation. We don’t t have to use Enumerator::Generator directly, since that is taken care for us by the yielder object.

Let’s think about how the client would use our code. In fact, we will try to mimic the real implementation as much as possible. Here’s an example:

1.upto(Float::INFINITY).lax
  .map { |x| x*x }
  .map { |x| x+1 }
  .take(5)
  .to_a

This should get us [2, 5, 10, 17, 26].

Here’s where the sleight of hand comes in. When lax is called, an instance of the Lax enumerator is returned. When map is called, this method is called on a Lax instance, not the map defined on the enumerable.

To enable something like 1.upto(Float::INFINITY).lax and [1,2,3].lax and return a new Lax instance, we have to add a new method into the Enumerable module:

module Enumerable
  def lax
    Lax.new(self)
  end
end

class Lax < Enumerator
end

self is the actual enumerable. It is passed in as an argument to the Lax constructor. If you run the code now, you wouldn’t get any errors, but you will still get the infinite loop. That’s because all we did was just providing an extra level of indirection without doing anything interesting.

We know that we need to populate the yielder, so let’s do that:

class Lax < Enumerator
  def initialize(receiver)
    super() do |yielder|
      receiver.each do |val|
        yielder << val
      end
    end
  end
end

Even with this tiny piece of code, we can already iterate an infinite collection, one by one:

irb> e = 1.upto(Float::INFINITY).lax
=> #<Lax: #<Enumerator::Generator:0x007f8324155c30>:each>
irb> e.next
=> 1
irb> e.next
=> 2
irb> e.next
=> 3

Implementing Lazy map

Now, we need to implement method chaining on methods such as map and take, for example. Because Enumerable::Lax returns a Lax instance, this means we need to define Lax versions of map and take. Each invocation of map and take in turn return yet another Lax instance.

How would the map method in Lax look like? For starters:

def map(&block)
end

We also know that we need to return a new Lax instance. This time, self is going to be a another Lax instance.

def map(&block)
  Lax.new(self)
end

What we really want is to populate the yielder object with elements that are modified by the map method. That means that we somehow must pass in two things into the Lax instance in map: the yielder object and the element to be mapped. We can pass these via a block argument like so:

def map(&block)
  Lax.new(self) do |yielder, val|
    yielder << block.call(val)
  end
end

What does this do? It looks like the block of map is invoked with val, and that element is then passed into the yielder. That’s not completely accurate though. It is more like the instructions on how to treat a mapped value are stored inside the yielder.

Since Lax can receive a block, we need to modify the constructor to handle this case:

class Lax < Enumerator
  def initialize(receiver)
    super() do |yielder|
      receiver.each do |val|        
        if block_given? 
          yield(yielder, val)
        else
          yielder << val
        end
      end
    end
  end
end

When there’s a block supplied, as in the case of the Lax version of map, we then yield the block, passing in the yielder object and the element val.

Does it work? Let’s find out!

p 1.upto(Float::INFINITY).lax
        .map { |x| x*x }
  .map { |x| x+1 }
  .first(5)

Enumerable#first(5) returns the first 5 elements of the enumerable, which gives you [2, 5, 10, 17, 26].

Implementing Lazy take

Now that we have seen how map is implemented, let’s try implementing take. As its name suggests, Enumerable#take(n) returns the first n elements from the Enum. The lazy version will return a Lax instance wrapped with Enumerable#take.

class Lax < Enumerator
  # initialize and map goes here 

  def take(n)
    taken = 0
    Lax.new(self) do |yielder, val|
      if taken < n
        yielder << val
        taken += 1 
      else
        raise StopIteration
      end
    end
  end
end

The logic for take should be easy enough to understand. The interesting thing here is how take signals that the iteration has ended – it raises a StopIteration exception.

Don’t get mad, because that is exactly how Ruby implements it. We can handle that exception inside the constructor, and simply do nothing when we detect a StopIteration exception:

class Lax < Enumerator

  def initialize(receiver)
    super() do |yielder|
      begin
        receiver.each do |val|
          if block_given? 
            yield(yielder, val)
          else
            yielder << val
          end
        end
      rescue StopIteration
      end
    end
  end

  # map and take goes here ...
end

Before we take the code for another spin, here is what you should have:

module Enumerable
  def lax
    Lax.new(self)
  end
end

class Lax < Enumerator

  def initialize(receiver)
    super() do |yielder|
      begin
        receiver.each do |val|
          if block_given? 
            yield(yielder, val)
          else
            yielder << val
          end
        end
      rescue StopIteration
      end
    end
  end

  def map(&block)
    Lax.new(self) do |yielder, val|
      yielder << block.call(val)
    end
  end

  def take(n)
    taken = 0
    Lax.new(self) do |yielder, val|
      if taken < n
        yielder << val
        taken += 1 
      else
        raise StopIteration
      end
    end
  end

end

With that in place, let’s try out the code:

p 1.upto(Float::INFINITY).lax
  .map { |x| x*x }
  .map { |x| x+1 }
  .take(10)
  .to_a

If you get [2, 5, 10, 17, 26], then it worked!

Summary

Lazy enumerables seem magical when you first encounter them. In fact, even when you start unravelling the pieces which make up lazy enumerables, you will undoubtedly find interesting bits of Ruby that you might not have otherwise encountered.

Before writing this chapter, I’ve never encountered Enumerator::Yielder, Enumerator::Generator and Fiber. Thankfully, the Rubinius implementation of Enumerable has been a great help. Learning about how the behavior of Fiber.yield was the “AHA!” moment for me.

  • Really nice! I’m still learning Ruby, but sounds really good!
    I watch Ruby Tapas from Avdi Grimm and you should make a guest episode to show some of these features. Talk to him!

  • jamlee

    Ruby is amazing.

    • ggsp

      @webboy89860:disqus Congrats! You get the Ruby Closures Book. Can you send me your email address pls?

  • Wow this is quite interesting, I’ve tried these examples from article and making any loop form 1 to Float::INFINITY without lazy goes to nowhere… – but small loops from 1 to 1000 takes moment, so we don’t need any lazy loops – I love ruby :)

  • peter marien

    Never thought that something as powerfull as lazy could be so simple, well explained ! Thanks for again making me clear Ruby is the right choice.

  • Thom Parkin

    No need to enter my name in the drawing for a FREE copy of “The RUBY CLOSURES book”. After reading this outstanding article I am going to buy a copy of the book immediately.
    Well written, clear and direct. Fantastic job, Benjamin.
    Thank you!

  • Robert John

    Another reminder of how much more is out there to learn. Great read.

  • Juan Pastás

    Thanks for the article Benjamin. I enjoyed learning those concepts. I have a lot of questions.

    What are the advantages of using Enumerators vs only loops? creating other methods to apply to work with collection? chaining? which others? which are applications of a lazy enumerator that can not be easily achieved with only loops? or in other words when is needed to write 1.upto(Float::INFINITY)?

  • Andrey

    In the last example you’ve probably meant .take(5), not .take(10). Awesome article, just awesome!

Recommended
Sponsors
Get the latest in Ruby, once a week, for free.