Implementing Lazy Enumerables in Ruby

This post is an excerpt from The Ruby Closures Book, written by Benjamin himself. If you like this post (and you will), tell us what you enjoyed at the end to be entered into a draw for a free copy! Now…read on…

I’ve always been fascinated by Ruby’s lazy enumeration, a feature that was introduced in Ruby 2.0. In this article, we take a deep dive into lazy enumeration, learning how Ruby implements this interesting programming technique by making your own lazy enumerable.

What exactly is lazy? Is Ruby trying to slack off? Being lazy refers to the style of evaluation. To let this sink in, consider the opposite of lazy evaluation: eager evaluation.

There really isn’t much to talk about regarding eager evaluation, as it’s the standard way Ruby works. But sometimes, being eager is a bad thing. For instance, what do you think this evaluates to:

irb> 1.upto(Float::INFINITY).map { |x| x * x }.take(10)

You might expect the result to be [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]. Unfortunately, the Ruby interpreter doesn’t know when to stop. The problem here is 1.upto(Float::INFINITY). 1.upto(Float::INFINITY) represents an infinite sequence. How does it look like in the console?

irb> 1.upto(Float::INFINITY)
=> #<Enumerator: 1:upto(Infinity)>

No surprise here, an enumerator is returned. Let’s try to force values out from the enumerator using Enumerator#to_a:

irb> 1.upto(5).to_a
=> [1, 2, 3, 4, 5]

To infinity and beyond!:

irb> 1.upto(Float::INFINITY).to_a

You shouldn’t be surprised by now that this will lead to an infinite loop. Notice that Enumerator#to_a is useful to “force” values out of an enumerator. In fact, Enumerator#to_a is aliased to Enumerator#force. This is useful, as you will see later on, when you want to know all the values produced by an enumerator.

Bringing on the Lazy!

How do we get values from an infinite list then? Introducing Enumerable#lazy:

irb> 1.upto(Float::INFINITY).lazy.map { |x| x * x }
=> #<Enumerator::Lazy: #<Enumerator::Lazy: #<Enumerator: 1:upto(Infinity)>>:map>

It appears that the 1.upto(Float::INFINITY) enumerator has been made lazy by wrapping it with an Enumerator::Lazy. This lazy enumerator itself wraps around a lazy version of map. Let’s try the very first expression again, but this time with Enumerable#lazy:

irb> 1.upto(Float::INFINITY).lazy.map { |x| x * x }.take(10)
=> #<Enumerator::Lazy: #<Enumerator::Lazy: #<Enumerator::Lazy: #<Enumerator: 1:upto(Infinity)>>:map>:take(10)>

Ah, Enumerable#take is also wrapped up! How do we get all the values? Enumerable#to_a to the rescue:

irb> 1.upto(Float::INFINITY).lazy.map { |x| x * x }.take(10).to_a
=> [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

How does this magic work? We are about to find out. We will implement our own version of lazy enumerables, albeit a minimal version. Along the way, you will also learn interesting aspects of Ruby’s enumerators.

Building the Skeleton

The first order of things is to decide where our implementation should live. To do that, we just have to find out where Enumerator::Lazy lives. If we head over to the official documentation, there’s a clue:

So, Enumerator is the parent of Lazy. In code:

class Lazy < Enumerator
end

Instead of re-opening Ruby classes like that (I get involuntary twitches), for our little exercise we are going to invent another name. A quick trip to the thesaurus yields something similar to Lazy. Introducing, Lax!:

class Lax < Enumerator
end

External VS Internal Iteration

What does inheriting from Enumerator buy us? To answer that question, we need a refresher on what an Enumerator is. From the docs:

A class which allows both internal and external iteration.

What is the difference between the two flavors of iteration? The key lies with who controls the iteration.

For internal iteration, it is the Array object (or any Enumerable) that controls the iteration. In fact, that’s how you normally interact with Enumerables.

External iteration on the other hand, is controlled by some other object wrapped around an Enumerable.

Why would you want external iterators in the first place? Well, sometimes you do not want to iterate through all of the elements in one go. You want to be able to tell the enumerable, “give me exactly one now, and when I need the next one, I will ask again”. External iterators give you that ability.

Creating an Enumerator from an Enumerable

I mentioned that an Enumerator wraps an Enumerable. We can see that in code:

irb> e = Enumerator.new([1,2,3])
irb> e.next
=> 1
irb> e.next
=> 2
irb> e.next
=> 3
irb> e.next
StopIteration: iteration reached an end
      from (irb):7:in `next'

When we wrap an array with an enumerator, we can call Enumerator#next multiple times to retrieve the next value. When there are no more values left, the StopIteration exception is raised.

Notice that Ruby complains about using Object#to_enum instead, or using it with a block. Let’s use a block, because that is more interesting:

e = Enumerator.new do |yielder|
  [1,2,3].each do |val|
    yielder << val
  end
end

Let’s see the code in action in irb:

irb(main):008:0> e = Enumerator.new do |yielder|
irb(main):009:1*   [1,2,3].each do |val|
irb(main):010:2*     yielder << val
irb(main):011:2>   end
irb(main):012:1> end
=> #<Enumerator: #<Enumerator::Generator:0x007fb9798e0668>:each>

As usual, we can call Enumerator#next:

irb(main):013:0> e.next
=> 1
irb(main):014:0> e.next
=> 2
irb(main):015:0> e.next
=> 3
irb(main):016:0> e.next
StopIteration: iteration reached an end
        from (irb):16:in `next'
        from (irb):16
        from /Users/benjamintan/.rbenv/versions/2.2.0/bin/irb:11:in `<main>'

Let’s look at the code again, because there’s more than meets the eye. There are a few questions that come up:

e = Enumerator.new do |yielder|
  [1,2,3].each do |val|
    yielder << val
  end
end

What is this yielder object that is passed into the block?
What does yielder < < val do?
Most importantly, how is it possible that simply wrapping an Enumerable enables the retrieval of the elements one by one?

We can answer the first two questions. The yielder object is passed into the block when you create an Enumerator object using with a block.

The purpose of the yielder object is to store the instructions for the next yield. Not the value, but the instructions. That’s what yielder < < val does. The < < is aliased to the yield method, but it has nothing to do with the yield keyword. Yes, it is confusing.

When Enumerator#next is called, it returns the next value and returns. This suggests that the yielder object must remember some form of state. How is this achieved? The return value from the previous code listing gives us a clue:

#<Enumerator: #<Enumerator::Generator:0x007fb9798e0668>:each>

This tells us that an Enumerator object contains another object called Enumerator ::Generator.

Generators, Fibers, Oh My!

Let’s take a detour and explore Enumerator::Generator. Generators are able to convert an internal iterator, such as [1,2,3], into an external one. It is the secret sauce which allows the one-by-one retrieval of the elements.

Here is how generators work:

First, it computes some result
This result is handed back to the caller
In addition, it also saves the state of the computation so that the caller can resume that computation to generate the next result

One way to do that is to use a little known Ruby construct called a fiber. The Fiber class is perfect of converting an internal iterator to an external one.

I’ve not used fibers much myself, but nonetheless, they are pretty fun to explore. Let’s do a quick run through of the basics.

A fiber is created with Fiber.new, along with a block that represents the computation. Let’s look at this example:

f = Fiber.new do
  x = 0
  loop do
    Fiber.yield x
          x += 1
  end
end

We create a fiber, and give it a block of computation. This block contains a loop that doesn’t end. What is this Fiber.yield? That is the secret sauce! We will get to that in a second.

When you create a fiber like in the above example, the block isn’t executed immediately. So, how do you execute the block? With Fiber#resume. Observe:

irb> f = Fiber.new do
irb*   x = 0
irb>   loop do
irb*     Fiber.yield x
irb>     x += 1
irb>   end
irb> end
=> #<Fiber:0x007fb979023e58>
irb> f.resume
=> 0
irb> f.resume
=> 1
irb> f.resume
=> 2

Now back to the secret sauce. What you have just created is an infinite number generator. The reason the loop doesn’t run indefinitely is because of the behavior of Fiber.yield. First of all, this has no relation to the yield keyword, as you know it.

The code executes Fiber.yield x when Fiber#resume is called, the value is returned to the caller, and control is given back to the caller. Until the next time Fiber#resume is called, then x is incremented, the loop goes for another round, executes Fiber.yield x again, and gives control back to the caller.

Implementing `Lax`, Our Very Own `Enumerator::Lazy`

Now that we understand how generators work, let’s head back to the Enumerator::Lax implementation. We don’t t have to use Enumerator::Generator directly, since that is taken care for us by the yielder object.

Let’s think about how the client would use our code. In fact, we will try to mimic the real implementation as much as possible. Here’s an example:

1.upto(Float::INFINITY).lax
  .map { |x| x*x }
  .map { |x| x+1 }
  .take(5)
  .to_a

This should get us [2, 5, 10, 17, 26].

Here’s where the sleight of hand comes in. When lax is called, an instance of the Lax enumerator is returned. When map is called, this method is called on a Lax instance, not the map defined on the enumerable.

To enable something like 1.upto(Float::INFINITY).lax and [1,2,3].lax and return a new Lax instance, we have to add a new method into the Enumerable module:

module Enumerable
  def lax
    Lax.new(self)
  end
end

class Lax < Enumerator
end

self is the actual enumerable. It is passed in as an argument to the Lax constructor. If you run the code now, you wouldn’t get any errors, but you will still get the infinite loop. That’s because all we did was just providing an extra level of indirection without doing anything interesting.

We know that we need to populate the yielder, so let’s do that:

class Lax < Enumerator
  def initialize(receiver)
    super() do |yielder|
      receiver.each do |val|
        yielder << val
      end
    end
  end
end

Even with this tiny piece of code, we can already iterate an infinite collection, one by one:

irb> e = 1.upto(Float::INFINITY).lax
=> #<Lax: #<Enumerator::Generator:0x007f8324155c30>:each>
irb> e.next
=> 1
irb> e.next
=> 2
irb> e.next
=> 3

Implementing Lazy `map`

Now, we need to implement method chaining on methods such as map and take, for example. Because Enumerable::Lax returns a Lax instance, this means we need to define Lax versions of map and take. Each invocation of map and take in turn return yet another Lax instance.

How would the map method in Lax look like? For starters:

def map(&block)
end

We also know that we need to return a new Lax instance. This time, self is going to be a another Lax instance.

def map(&block)
  Lax.new(self)
end

What we really want is to populate the yielder object with elements that are modified by the map method. That means that we somehow must pass in two things into the Lax instance in map: the yielder object and the element to be mapped. We can pass these via a block argument like so:

def map(&block)
  Lax.new(self) do |yielder, val|
    yielder << block.call(val)
  end
end

What does this do? It looks like the block of map is invoked with val, and that element is then passed into the yielder. That’s not completely accurate though. It is more like the instructions on how to treat a mapped value are stored inside the yielder.

Since Lax can receive a block, we need to modify the constructor to handle this case:

class Lax < Enumerator
  def initialize(receiver)
    super() do |yielder|
      receiver.each do |val|        
        if block_given? 
          yield(yielder, val)
        else
          yielder << val
        end
      end
    end
  end
end

When there’s a block supplied, as in the case of the Lax version of map, we then yield the block, passing in the yielder object and the element val.

Does it work? Let’s find out!

p 1.upto(Float::INFINITY).lax
        .map { |x| x*x }
  .map { |x| x+1 }
  .first(5)

Enumerable#first(5) returns the first 5 elements of the enumerable, which gives you [2, 5, 10, 17, 26].

Implementing Lazy `take`

Now that we have seen how map is implemented, let’s try implementing take. As its name suggests, Enumerable#take(n) returns the first n elements from the Enum. The lazy version will return a Lax instance wrapped with Enumerable#take.

class Lax < Enumerator
  # initialize and map goes here 

  def take(n)
    taken = 0
    Lax.new(self) do |yielder, val|
      if taken < n
        yielder << val
        taken += 1 
      else
        raise StopIteration
      end
    end
  end
end

The logic for take should be easy enough to understand. The interesting thing here is how take signals that the iteration has ended – it raises a StopIteration exception.

Don’t get mad, because that is exactly how Ruby implements it. We can handle that exception inside the constructor, and simply do nothing when we detect a StopIteration exception:

class Lax < Enumerator

  def initialize(receiver)
    super() do |yielder|
      begin
        receiver.each do |val|
          if block_given? 
            yield(yielder, val)
          else
            yielder << val
          end
        end
      rescue StopIteration
      end
    end
  end

  # map and take goes here ...
end

Before we take the code for another spin, here is what you should have:

module Enumerable
  def lax
    Lax.new(self)
  end
end

class Lax < Enumerator

  def initialize(receiver)
    super() do |yielder|
      begin
        receiver.each do |val|
          if block_given? 
            yield(yielder, val)
          else
            yielder << val
          end
        end
      rescue StopIteration
      end
    end
  end

  def map(&block)
    Lax.new(self) do |yielder, val|
      yielder << block.call(val)
    end
  end

  def take(n)
    taken = 0
    Lax.new(self) do |yielder, val|
      if taken < n
        yielder << val
        taken += 1 
      else
        raise StopIteration
      end
    end
  end

end

With that in place, let’s try out the code:

p 1.upto(Float::INFINITY).lax
  .map { |x| x*x }
  .map { |x| x+1 }
  .take(10)
  .to_a

If you get [2, 5, 10, 17, 26], then it worked!

Summary

Lazy enumerables seem magical when you first encounter them. In fact, even when you start unravelling the pieces which make up lazy enumerables, you will undoubtedly find interesting bits of Ruby that you might not have otherwise encountered.

Before writing this chapter, I’ve never encountered Enumerator::Yielder, Enumerator::Generator and Fiber. Thankfully, the Rubinius implementation of Enumerable has been a great help. Learning about how the behavior of Fiber.yield was the “AHA!” moment for me.

Frequently Asked Questions about Implementing Lazy Enumerables in Ruby

What is the main advantage of using lazy enumerables in Ruby?

The primary advantage of using lazy enumerables in Ruby is that they allow for the efficient handling of large data sets. Lazy enumerables defer computation until it’s absolutely necessary, which can significantly improve performance when dealing with large collections. This is because they only process elements that are needed at any given time, rather than processing the entire collection upfront. This can be particularly beneficial in applications that need to handle large amounts of data or perform complex computations.

How does the ‘lazy’ method work in Ruby?

The ‘lazy’ method in Ruby is used to create a lazy version of an enumerable. When called on an enumerable, it returns a ‘Lazy’ object. This object has most of the same methods as the original enumerable, but these methods are now lazy. That means they don’t immediately process all the elements of the enumerable. Instead, they return a new ‘Lazy’ object that will apply the operation when needed. This allows for efficient chaining of operations, as each operation is only applied to an element when it’s actually needed.

Can you provide an example of using a lazy enumerable in Ruby?

Sure, here’s a simple example of using a lazy enumerable in Ruby:

range = 1..Float::INFINITY
lazy_range = range.lazy

lazy_range.select { |num| num.odd? }.first(10)

In this example, we first create an infinite range. We then convert this range to a lazy enumerable using the ‘lazy’ method. Finally, we select the first 10 odd numbers from this lazy enumerable. Without the use of lazy enumerables, this code would not be possible, as it would attempt to process an infinite number of elements.

Are there any downsides to using lazy enumerables in Ruby?

While lazy enumerables can be very efficient for large data sets, they can also be slower for small data sets. This is because the overhead of setting up the lazy computation can outweigh the benefits for small collections. Therefore, it’s important to use lazy enumerables judiciously, and only when they provide a clear benefit.

How does lazy evaluation work with methods like ‘map’ and ‘select’?

When methods like ‘map’ and ‘select’ are called on a lazy enumerable, they don’t immediately process all the elements. Instead, they return a new lazy enumerable that knows how to apply these operations when needed. This allows for efficient chaining of operations, as each operation is only applied to an element when it’s actually needed.

Can I use lazy enumerables with any enumerable in Ruby?

Yes, you can use lazy enumerables with any enumerable in Ruby. This includes arrays, ranges, hashes, and any other object that includes the Enumerable module. To create a lazy enumerable, simply call the ‘lazy’ method on the enumerable.

How can I force a lazy enumerable to evaluate its elements?

You can force a lazy enumerable to evaluate its elements by calling the ‘force’ method on it. This will return a regular array with all the elements of the lazy enumerable. However, be careful when using this method, as it can lead to performance issues if the lazy enumerable contains a large number of elements.

Can I use lazy enumerables in Ruby on Rails?

Yes, you can use lazy enumerables in Ruby on Rails. However, keep in mind that Rails already provides many efficient ways to handle large collections, such as batch processing methods and the ‘find_each’ method for ActiveRecord relations. Therefore, while you can use lazy enumerables in Rails, they may not always be the best solution.

How does garbage collection work with lazy enumerables in Ruby?

Lazy enumerables in Ruby can help reduce memory usage and improve garbage collection. This is because they only hold onto the elements that are currently being processed, rather than the entire collection. Once an element has been processed, it can be garbage collected if there are no other references to it.

Can I create my own lazy enumerable methods in Ruby?

Yes, you can create your own lazy enumerable methods in Ruby. To do this, you would need to define a method that returns a new lazy enumerable. This new lazy enumerable should know how to apply the operation defined by your method when its elements are evaluated.