- Key Takeaways
- Bringing on the Lazy!
- Building the Skeleton
- External VS Internal Iteration
- Creating an Enumerator from an Enumerable
- Generators, Fibers, Oh My!
- Implementing Lax, Our Very Own Enumerator::Lazy
- Implementing Lazy map
- Implementing Lazy take
- Summary
- Frequently Asked Questions about Implementing Lazy Enumerables in Ruby
Key Takeaways
- Lazy enumerables in Ruby allow for efficient handling of large data sets by deferring computation until necessary, which improves performance especially with infinite sequences.
- Ruby’s `Enumerable#lazy` method transforms an enumerable into a lazy version, enabling operations like `map` and `take` to be performed without immediate full evaluation.
- Implementing custom lazy enumerables, such as the `Lax` class, involves using generators and fibers to manage state and control flow, allowing element-by-element processing.
- The `Lax` class provides methods like `map` and `take` which return new lazy instances, enabling method chaining and efficient data processing without evaluating all elements immediately.
- Using lazy enumerables can sometimes introduce overhead that outweighs benefits for smaller data sets, so their use should be judiciously considered based on the context.
This post is an excerpt from The Ruby Closures Book, written by Benjamin himself. If you like this post (and you will), tell us what you enjoyed at the end to be entered into a draw for a free copy! Now…read on…
I’ve always been fascinated by Ruby’s lazy enumeration, a feature that was introduced in Ruby 2.0. In this article, we take a deep dive into lazy enumeration, learning how Ruby implements this interesting programming technique by making your own lazy enumerable.
What exactly is lazy? Is Ruby trying to slack off? Being lazy refers to the style of evaluation. To let this sink in, consider the opposite of lazy evaluation: eager evaluation.
There really isn’t much to talk about regarding eager evaluation, as it’s the standard way Ruby works. But sometimes, being eager is a bad thing. For instance, what do you think this evaluates to:
irb> 1.upto(Float::INFINITY).map { |x| x * x }.take(10)
You might expect the result to be [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
. Unfortunately, the Ruby interpreter doesn’t know when to stop. The problem here is 1.upto(Float::INFINITY)
. 1.upto(Float::INFINITY)
represents an infinite sequence. How does it look like in the console?
irb> 1.upto(Float::INFINITY)
=> #<Enumerator: 1:upto(Infinity)>
No surprise here, an enumerator is returned. Let’s try to force values out from the enumerator using Enumerator#to_a
:
irb> 1.upto(5).to_a
=> [1, 2, 3, 4, 5]
To infinity and beyond!:
irb> 1.upto(Float::INFINITY).to_a
You shouldn’t be surprised by now that this will lead to an infinite loop. Notice that Enumerator#to_a
is useful to “force” values out of an enumerator. In fact, Enumerator#to_a
is aliased to Enumerator#force
. This is useful, as you will see later on, when you want to know all the values produced by an enumerator.
Bringing on the Lazy!
How do we get values from an infinite list then? Introducing Enumerable#lazy
:
irb> 1.upto(Float::INFINITY).lazy.map { |x| x * x }
=> #<Enumerator::Lazy: #<Enumerator::Lazy: #<Enumerator: 1:upto(Infinity)>>:map>
It appears that the 1.upto(Float::INFINITY)
enumerator has been made lazy by wrapping it with an Enumerator::Lazy
. This lazy enumerator itself wraps around a lazy version of map. Let’s try the very first expression again, but this time with Enumerable#lazy
:
irb> 1.upto(Float::INFINITY).lazy.map { |x| x * x }.take(10)
=> #<Enumerator::Lazy: #<Enumerator::Lazy: #<Enumerator::Lazy: #<Enumerator: 1:upto(Infinity)>>:map>:take(10)>
Ah, Enumerable#take
is also wrapped up! How do we get all the values? Enumerable#to_a
to the rescue:
irb> 1.upto(Float::INFINITY).lazy.map { |x| x * x }.take(10).to_a
=> [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
How does this magic work? We are about to find out. We will implement our own version of lazy enumerables, albeit a minimal version. Along the way, you will also learn interesting aspects of Ruby’s enumerators.
Building the Skeleton
The first order of things is to decide where our implementation should live. To do that, we just have to find out where Enumerator::Lazy
lives. If we head over to the official documentation, there’s a clue:
So, Enumerator
is the parent of Lazy
. In code:
class Lazy < Enumerator
end
Instead of re-opening Ruby classes like that (I get involuntary twitches), for our little exercise we are going to invent another name. A quick trip to the thesaurus yields something similar to Lazy
. Introducing, Lax
!:
class Lax < Enumerator
end
External VS Internal Iteration
What does inheriting from Enumerator
buy us? To answer that question, we need a refresher on what an Enumerator is. From the docs:
A class which allows both internal and external iteration.
What is the difference between the two flavors of iteration? The key lies with who controls the iteration.
For internal iteration, it is the Array
object (or any Enumerable
) that controls the iteration. In fact, that’s how you normally interact with Enumerable
s.
External iteration on the other hand, is controlled by some other object wrapped around an Enumerable
.
Why would you want external iterators in the first place? Well, sometimes you do not want to iterate through all of the elements in one go. You want to be able to tell the enumerable, “give me exactly one now, and when I need the next one, I will ask again”. External iterators give you that ability.
Creating an Enumerator from an Enumerable
I mentioned that an Enumerator
wraps an Enumerable
. We can see that in code:
irb> e = Enumerator.new([1,2,3])
irb> e.next
=> 1
irb> e.next
=> 2
irb> e.next
=> 3
irb> e.next
StopIteration: iteration reached an end
from (irb):7:in `next'
When we wrap an array with an enumerator, we can call Enumerator#next
multiple times to retrieve the next value. When there are no more values left, the StopIteration
exception is raised.
Notice that Ruby complains about using Object#to_enum
instead, or using it with a block. Let’s use a block, because that is more interesting:
e = Enumerator.new do |yielder|
[1,2,3].each do |val|
yielder << val
end
end
Let’s see the code in action in irb
:
irb(main):008:0> e = Enumerator.new do |yielder|
irb(main):009:1* [1,2,3].each do |val|
irb(main):010:2* yielder << val
irb(main):011:2> end
irb(main):012:1> end
=> #<Enumerator: #<Enumerator::Generator:0x007fb9798e0668>:each>
As usual, we can call Enumerator#next
:
irb(main):013:0> e.next
=> 1
irb(main):014:0> e.next
=> 2
irb(main):015:0> e.next
=> 3
irb(main):016:0> e.next
StopIteration: iteration reached an end
from (irb):16:in `next'
from (irb):16
from /Users/benjamintan/.rbenv/versions/2.2.0/bin/irb:11:in `<main>'
Let’s look at the code again, because there’s more than meets the eye. There are a few questions that come up:
e = Enumerator.new do |yielder|
[1,2,3].each do |val|
yielder << val
end
end
- What is this
yielder
object that is passed into the block? - What does
yielder < < val
do? - Most importantly, how is it possible that simply wrapping an
Enumerable
enables the retrieval of the elements one by one?
We can answer the first two questions. The yielder
object is passed into the block when you create an Enumerator
object using with a block.
The purpose of the yielder
object is to store the instructions for the next yield. Not the value, but the instructions. That’s what yielder < < val
does. The < <
is aliased to the yield
method, but it has nothing to do with the yield
keyword. Yes, it is confusing.
When Enumerator#next
is called, it returns the next value and returns. This suggests that the yielder
object must remember some form of state. How is this achieved? The return value from the previous code listing gives us a clue:
#<Enumerator: #<Enumerator::Generator:0x007fb9798e0668>:each>
This tells us that an Enumerator
object contains another object called Enumerator ::Generator
.
Generators, Fibers, Oh My!
Let’s take a detour and explore Enumerator::Generator
. Generators are able to convert an internal iterator, such as [1,2,3]
, into an external one. It is the secret sauce which allows the one-by-one retrieval of the elements.
Here is how generators work:
- First, it computes some result
- This result is handed back to the caller
- In addition, it also saves the state of the computation so that the caller can resume that computation to generate the next result
One way to do that is to use a little known Ruby construct called a fiber. The Fiber
class is perfect of converting an internal iterator to an external one.
I’ve not used fibers much myself, but nonetheless, they are pretty fun to explore. Let’s do a quick run through of the basics.
A fiber is created with Fiber.new
, along with a block that represents the computation. Let’s look at this example:
f = Fiber.new do
x = 0
loop do
Fiber.yield x
x += 1
end
end
We create a fiber, and give it a block of computation. This block contains a loop that doesn’t end. What is this Fiber.yield
? That is the secret sauce! We will get to that in a second.
When you create a fiber like in the above example, the block isn’t executed immediately. So, how do you execute the block? With Fiber#resume
. Observe:
irb> f = Fiber.new do
irb* x = 0
irb> loop do
irb* Fiber.yield x
irb> x += 1
irb> end
irb> end
=> #<Fiber:0x007fb979023e58>
irb> f.resume
=> 0
irb> f.resume
=> 1
irb> f.resume
=> 2
Now back to the secret sauce. What you have just created is an infinite number generator. The reason the loop doesn’t run indefinitely is because of the behavior of Fiber.yield
. First of all, this has no relation to the yield
keyword, as you know it.
The code executes Fiber.yield x
when Fiber#resume
is called, the value is returned to the caller, and control is given back to the caller. Until the next time Fiber#resume
is called, then x
is incremented, the loop goes for another round, executes Fiber.yield x
again, and gives control back to the caller.
Implementing Lax
, Our Very Own Enumerator::Lazy
Now that we understand how generators work, let’s head back to the Enumerator::Lax
implementation. We don’t t have to use Enumerator::Generator
directly, since that is taken care for us by the yielder
object.
Let’s think about how the client would use our code. In fact, we will try to mimic the real implementation as much as possible. Here’s an example:
1.upto(Float::INFINITY).lax
.map { |x| x*x }
.map { |x| x+1 }
.take(5)
.to_a
This should get us [2, 5, 10, 17, 26]
.
Here’s where the sleight of hand comes in. When lax
is called, an instance of the Lax
enumerator is returned. When map
is called, this method is called on a Lax
instance, not the map
defined on the enumerable.
To enable something like 1.upto(Float::INFINITY).lax
and [1,2,3].lax
and return a new Lax
instance, we have to add a new method into the Enumerable
module:
module Enumerable
def lax
Lax.new(self)
end
end
class Lax < Enumerator
end
self
is the actual enumerable. It is passed in as an argument to the Lax
constructor. If you run the code now, you wouldn’t get any errors, but you will still get the infinite loop. That’s because all we did was just providing an extra level of indirection without doing anything interesting.
We know that we need to populate the yielder
, so let’s do that:
class Lax < Enumerator
def initialize(receiver)
super() do |yielder|
receiver.each do |val|
yielder << val
end
end
end
end
Even with this tiny piece of code, we can already iterate an infinite collection, one by one:
irb> e = 1.upto(Float::INFINITY).lax
=> #<Lax: #<Enumerator::Generator:0x007f8324155c30>:each>
irb> e.next
=> 1
irb> e.next
=> 2
irb> e.next
=> 3
Implementing Lazy map
Now, we need to implement method chaining on methods such as map
and take
, for example. Because Enumerable::Lax
returns a Lax
instance, this means we need to define Lax
versions of map
and take
. Each invocation of map
and take
in turn return yet another Lax
instance.
How would the map method in Lax
look like? For starters:
def map(&block)
end
We also know that we need to return a new Lax
instance. This time, self
is going to be a another Lax
instance.
def map(&block)
Lax.new(self)
end
What we really want is to populate the yielder object with elements that are modified by the map
method. That means that we somehow must pass in two things into the Lax
instance in map
: the yielder
object and the element to be mapped. We can pass these via a block argument like so:
def map(&block)
Lax.new(self) do |yielder, val|
yielder << block.call(val)
end
end
What does this do? It looks like the block of map
is invoked with val
, and that element is then passed into the yielder. That’s not completely accurate though. It is more like the instructions on how to treat a mapped value are stored inside the yielder.
Since Lax
can receive a block, we need to modify the constructor to handle this case:
class Lax < Enumerator
def initialize(receiver)
super() do |yielder|
receiver.each do |val|
if block_given?
yield(yielder, val)
else
yielder << val
end
end
end
end
end
When there’s a block supplied, as in the case of the Lax
version of map
, we then yield the block, passing in the yielder
object and the element val
.
Does it work? Let’s find out!
p 1.upto(Float::INFINITY).lax
.map { |x| x*x }
.map { |x| x+1 }
.first(5)
Enumerable#first(5)
returns the first 5 elements of the enumerable, which gives you [2, 5, 10, 17, 26]
.
Implementing Lazy take
Now that we have seen how map
is implemented, let’s try implementing take
. As its name suggests, Enumerable#take(n)
returns the first n
elements from the Enum. The lazy version will return a Lax
instance wrapped with Enumerable#take
.
class Lax < Enumerator
# initialize and map goes here
def take(n)
taken = 0
Lax.new(self) do |yielder, val|
if taken < n
yielder << val
taken += 1
else
raise StopIteration
end
end
end
end
The logic for take
should be easy enough to understand. The interesting thing here is how take
signals that the iteration has ended – it raises a StopIteration
exception.
Don’t get mad, because that is exactly how Ruby implements it. We can handle that exception inside the constructor, and simply do nothing when we detect a StopIteration
exception:
class Lax < Enumerator
def initialize(receiver)
super() do |yielder|
begin
receiver.each do |val|
if block_given?
yield(yielder, val)
else
yielder << val
end
end
rescue StopIteration
end
end
end
# map and take goes here ...
end
Before we take the code for another spin, here is what you should have:
module Enumerable
def lax
Lax.new(self)
end
end
class Lax < Enumerator
def initialize(receiver)
super() do |yielder|
begin
receiver.each do |val|
if block_given?
yield(yielder, val)
else
yielder << val
end
end
rescue StopIteration
end
end
end
def map(&block)
Lax.new(self) do |yielder, val|
yielder << block.call(val)
end
end
def take(n)
taken = 0
Lax.new(self) do |yielder, val|
if taken < n
yielder << val
taken += 1
else
raise StopIteration
end
end
end
end
With that in place, let’s try out the code:
p 1.upto(Float::INFINITY).lax
.map { |x| x*x }
.map { |x| x+1 }
.take(10)
.to_a
If you get [2, 5, 10, 17, 26]
, then it worked!
Summary
Lazy enumerables seem magical when you first encounter them. In fact, even when you start unravelling the pieces which make up lazy enumerables, you will undoubtedly find interesting bits of Ruby that you might not have otherwise encountered.
Before writing this chapter, I’ve never encountered Enumerator::Yielder
, Enumerator::Generator
and Fiber
. Thankfully, the Rubinius implementation of Enumerable has been a great help. Learning about how the behavior of Fiber.yield
was the “AHA!” moment for me.
Frequently Asked Questions about Implementing Lazy Enumerables in Ruby
What is the main advantage of using lazy enumerables in Ruby?
The primary advantage of using lazy enumerables in Ruby is that they allow for the efficient handling of large data sets. Lazy enumerables defer computation until it’s absolutely necessary, which can significantly improve performance when dealing with large collections. This is because they only process elements that are needed at any given time, rather than processing the entire collection upfront. This can be particularly beneficial in applications that need to handle large amounts of data or perform complex computations.
How does the ‘lazy’ method work in Ruby?
The ‘lazy’ method in Ruby is used to create a lazy version of an enumerable. When called on an enumerable, it returns a ‘Lazy’ object. This object has most of the same methods as the original enumerable, but these methods are now lazy. That means they don’t immediately process all the elements of the enumerable. Instead, they return a new ‘Lazy’ object that will apply the operation when needed. This allows for efficient chaining of operations, as each operation is only applied to an element when it’s actually needed.
Can you provide an example of using a lazy enumerable in Ruby?
Sure, here’s a simple example of using a lazy enumerable in Ruby:range = 1..Float::INFINITY
lazy_range = range.lazy
lazy_range.select { |num| num.odd? }.first(10)
In this example, we first create an infinite range. We then convert this range to a lazy enumerable using the ‘lazy’ method. Finally, we select the first 10 odd numbers from this lazy enumerable. Without the use of lazy enumerables, this code would not be possible, as it would attempt to process an infinite number of elements.
Are there any downsides to using lazy enumerables in Ruby?
While lazy enumerables can be very efficient for large data sets, they can also be slower for small data sets. This is because the overhead of setting up the lazy computation can outweigh the benefits for small collections. Therefore, it’s important to use lazy enumerables judiciously, and only when they provide a clear benefit.
How does lazy evaluation work with methods like ‘map’ and ‘select’?
When methods like ‘map’ and ‘select’ are called on a lazy enumerable, they don’t immediately process all the elements. Instead, they return a new lazy enumerable that knows how to apply these operations when needed. This allows for efficient chaining of operations, as each operation is only applied to an element when it’s actually needed.
Can I use lazy enumerables with any enumerable in Ruby?
Yes, you can use lazy enumerables with any enumerable in Ruby. This includes arrays, ranges, hashes, and any other object that includes the Enumerable module. To create a lazy enumerable, simply call the ‘lazy’ method on the enumerable.
How can I force a lazy enumerable to evaluate its elements?
You can force a lazy enumerable to evaluate its elements by calling the ‘force’ method on it. This will return a regular array with all the elements of the lazy enumerable. However, be careful when using this method, as it can lead to performance issues if the lazy enumerable contains a large number of elements.
Can I use lazy enumerables in Ruby on Rails?
Yes, you can use lazy enumerables in Ruby on Rails. However, keep in mind that Rails already provides many efficient ways to handle large collections, such as batch processing methods and the ‘find_each’ method for ActiveRecord relations. Therefore, while you can use lazy enumerables in Rails, they may not always be the best solution.
How does garbage collection work with lazy enumerables in Ruby?
Lazy enumerables in Ruby can help reduce memory usage and improve garbage collection. This is because they only hold onto the elements that are currently being processed, rather than the entire collection. Once an element has been processed, it can be garbage collected if there are no other references to it.
Can I create my own lazy enumerable methods in Ruby?
Yes, you can create your own lazy enumerable methods in Ruby. To do this, you would need to define a method that returns a new lazy enumerable. This new lazy enumerable should know how to apply the operation defined by your method when its elements are evaluated.
Benjamin is a Software Engineer at EasyMile, Singapore where he spends most of his time wrangling data pipelines and automating all the things. He is the author of The Little Elixir and OTP Guidebook and Mastering Ruby Closures Book. Deathly afraid of being irrelevant, is always trying to catch up on his ever-growing reading list. He blogs, codes and tweets.