A Guide to Ruby Collections III: Enumerable and Enumerator
In this article, I’m going to cover the Enumerable module and the Enumerator class. In order to get maximum use out of Ruby’s collections, you will need to understand how they work and what they give you. In particular, Enumerable by itself contributes heavily towards Ruby’s terseness and flexibility. In fact, many new Rubyists use it under the hood without even knowing it.
Enumerable
Enumerable is possible in Ruby due to the notion of “mixins.” In most languages, code sharing is only possible through inheritance, and inheritance is generally limited to one parent class. In addition to classes, Ruby has modules which are containers for methods and constants. If you want to write code once and share it between a bunch of unrelated classes, you can put it in modules to be “mixed in” with your classes. Note that this is different from sharing an interface, like in Java, where only method signatures are shared.
It’s possible to find out which classes use Enumerable with Object#included_modules
.
>> Array.included_modules
=> [Enumerable, PP::ObjectMixin, Kernel]
>> String.included_modules
=> [Comparable, PP::ObjectMixin, Kernel]
In order for a class to use Enumerable, it must define an #each
method. This method should yield each item in the collection to a block. Remember the Colors class we made earlier? Let’s add Enumerable to it to give it magic iteration powers.
>> class Colors
>> include Enumerable
>> def each
>> yield "red"
>> yield "green"
>> yield "blue"
>> end
>> end
>> c = Colors.new
>> c.map { |i| i.reverse }
=> ["der", "neerg", "eulb"]
It is possible to see what all methods are provided by Enumerable by checking the output of the #instance_methods
method.
>> Enumerable.instance_methods.sort
=> [:all?, :any?, :chunk, :collect, :collect_concat, :count, :cycle,
:detect, :drop, :drop_while, :each_cons, :each_entry, :each_slice,
:each_with_index, :each_with_object, :entries, :find, :find_all,
:find_index, :first, :flat_map, :grep, :group_by, :include?, :inject,
:map, :max, :max_by, :member?, :min, :min_by, :minmax, :minmax_by,
:none?, :one?, :partition, :reduce, :reject, :reverse_each, :select,
:slice_before, :sort, :sort_by, :take, :take_while, :to_a, :zip]
Searching
Enumerable provides several functions for filtering your collections. Ones you will probably see often are “ect” family of methods.
>> [1,2,3,4,5].select { |i| i > 3 }
=> [4,5]
>> [1,2,3,4,5].detect { |i| i > 3 }
=> 4
>> [1,2,3,4,5].reject { |i| i > 3 }
=> [1,2,3]
The #find_all
and #find
methods perform the same operations.
Enumerable#grep
provides the ability to perform a general search. It’s a shortcut for #select
and the ===
operator.
Threequals (===
) is a strange but rather useful operator. It isn’t used for establishing equality in an absolute sense, but instead a more general one.
>> (1..3) === 2
=> true
>> (1..3) === 4
=> false
>> ('a'..'c') === 'b'
=> true
>> /el/ === "Hello World"
=> true
>> Object === Array
=> true
When using ===
, the more general object (i.e. Range, Regexp) goes on the left side of the operator, and the specific object goes on the right. It does not work the other way around,because threequals is generally overwritten one-way. Range#===
knows what to do with Fixnums, but Fixnum#===
doesn’t know how to handle Ranges.
>> 2 === (1..3)
=> false
If you wanted to do a very general search with select
, you could use threequals.
>> [:hello_world, "Jello World", 3].select { |i| /.ello..orld/ === i }
=> [:hello_world, "Jello World"]
Enumerable#grep
is a wrapper for this.
>> [6, 14, 28, 47, 12].grep(5..15)
=> [6, 14, 12]
>> [0.3, "three", :three, "thirty-three"].grep /three/
=> ["three", :three, "thirty-three"]
Look at what would happen if you tried a #map
operation on a heterogeneous Array.
>> ['a', 1, 2, 'b'].map(&:upcase)
=> NoMethodError: undefined method `upcase' for 1:Fixnum
We can use #grep
to filter the collection for elements that will accept our method.
>> ['a', 1, 2, 'b'].grep(String, &:upcase)
=> ["A", "B"]
Sorting
Let’s say you have an array of numbers represented as both integers and strings. If you attempt to use #sort
, it will fail.
>> [1, "5", 2, "7", "3"].sort
ArgumentError: comparison of Fixnum with String failed
Since strings are not guaranteed to be numbers with quotes around them, Ruby does not try to implicitly convert the strings into suitable numbers before trying to sort. We can fix this by telling how to sort the elements.
>> [1, "5", 2, "7", "3"].sort_by { |a| a.to_i }
=> [1, 2, "3", "5", "7"]
You can also use the slightly shorter version.
>> [1, "5", 2, "7", "3"].sort_by(&:to_i)
=> [1, 2, "3", "5", "7"]
Now, what if you are sorting something that isn’t a number or isn’t easily converted into one? The #sort
method is not magic. It checks the result of the combined comparison operator method (#<=>
).
The #<=>
method works like this:
>> 1 <=> 2
=> -1
>> 2 <=> 1
=> 1
>> 1 <=> 1
=> 0
If you want to make an arbitrary class sortable, you just need to define it. Go ahead and make a new class and see what happens when you try to sort.
>> class McGuffin
>> attr_accessor :absurdity
>> def initialize(a)
>> @absurdity = a
>> end
>> end
>> m1 = McGuffin.new(0.1)
>> m2 = McGuffin.new(2.3)
>> [m1, m2].sort
=> ArgumentError: comparison of McGuffin with McGuffin failed
It produces an error because Ruby does not know how to compare members of this new class. This is easily fixed by defining #<=>
.
>> class McGuffin
>> def <=>(other)
>> @absurdity <=> other.absurdity
>> end
>> end
>> [m1, m2].sort
=> [#>McGuffin:0x00000000de0238 @absurdity=0.1>,
#>McGuffin:0x00000000de50f8 @absurdity=2.3>]
Note that #<=>
is not a substitute for the other comparison operators. You will need to define those separately, if you want them.
>> m1 > m2
=> NoMethodError: undefined method `>'...
Any? and All?
Enumerable#any?
returns true if its block is true for any element in the collection. Enumerable#all?
returns true if its block is true for every element.
>> [2,4,6,8].all?(&:even?)
=> true
>> [2,4,5,8].any? { |i| i % 2 == 0 }
=> true
>> [2,4,5,8].all? { |i| i % 2 == 0 }
=> false
Who Needs Excel?
Enumerable#reduce
takes a collection and reduces it down to a single element. It applies an operation to each element, maintaining a running “total.”
For example, #reduce
can be used to sum a collection.
>> [1,2,3].reduce(:+)
=> 6
Ruby does not ship with a factorial method. However, thanks to #reduce
it’s easy to slap together a beautiful hack.
>> class Integer
>> def factorial
>> (1..self).reduce(:*) || 1
>> end
>> end
>> 6.factorial
=> 720
>> 0.factorial
=> 1
Enumerable#reduce
is notoriously hard to understand. So far, I have kept things simple by letting #reduce
‘s accumulator operate out of sight. But now let’s bring it out into the open.
>> [1,2,3].reduce(0) { |sum, element| sum + element }
=> 6
Whoa, whoa, whoa. What is all that stuff I just added? What does the argument passed to #reduce
mean?
Compare these three #reduce
calls.
[1,2,3].reduce do |accumulator, current|
puts "Accumulator: #{accumulator}, Current: #{current}"
accumulator + current
end
Accumulator: 1, Current: 2
Accumulator: 3, Current: 3
=> 6
[1,2,3].reduce(0) do |accumulator, current|
puts "Accumulator: #{accumulator}, Current: #{current}"
accumulator + current
end
Accumulator: 0, Current: 1
Accumulator: 1, Current: 2
Accumulator: 3, Current: 3
=> 6
[1,2,3].reduce(1) do |accumulator, current|
puts "Accumulator: #{accumulator}, Current: #{current}"
accumulator + current
end
Accumulator: 1, Current: 1
Accumulator: 2, Current: 2
Accumulator: 4, Current: 3
=> 7
I think what is most confusing in this case is the difference between passing 0 and nothing at all.
-
#reduce
– Current starts out as second element. Accumulator starts out as first element. -
#reduce(x)
– Current starts out as first element. Accumulator starts out as x.
How is this even useful? The example that ruby-doc.org uses is finding the longest word in a collection.
>> words = %w{cool bobsled Jamaican}
>> longest = words.reduce do |memo, word|
>> memo.length > word.length ? memo : word
>> end
=> "Jamaican"
Originally, #reduce
was known as #inject
, and you can use either one in modern Ruby. However, I personally prefer #reduce
because I find the idea of “injecting” a collection down to a single element confusing.
Infinite Iteration
It’s often useful to iterate through a collection an arbitrary number of times, or even infinitely. A naive way to do this would be to keep track of a counter index and reset it every time it hits the size of the collection – 1 (when collections are zero-indexed like in Ruby).
A better solution is to use an incrementer and mod (%
) the size of the collection to get each index. Let’s say you have 3 product id’s, and you need to perform 10 iteration steps.
>> arr = ["first", "middle", "last"]
>> 10.times { |i| puts arr[i % arr.size] }
first
middle
last
first
middle
last
first
middle
last
first
Ruby provides a slightly cleaner way of doing this with Enumerable#cycle
.
>> arr.cycle(2) { |i| puts i }
first
middle
last
first
middle
last
Passing an argument to cycle will completely iterate through the collection that many times. If no argument is passed, it iterates infinitely (producing an infinite loop).
There are a couple of problems with cycling this way:
- The argument to
#cycle
specifies the number of times to cycle, not the number of elements to cycle through - If you want to cycle infinitely, all of your relevant logic must go inside
#cycle
‘s block, because the code will never leave
Thankfully, both of these problems can be solved by using an Enumerator object.
Enumerator
If #cycle
is not passed a block, it will return an Enumerator.
>> cycle_enum = arr.cycle
=> #>Enumerator: ["first", "middle", "last"]:cycle>
Now, Enumerator#next
can be used to retrieve the next element as many times as necessary.
>> cycle_enum.next
=> "first"
>> cycle_enum.next
=> "middle"
>> cycle_enum.next
=> "last"
>> cycle_enum.next
=> "first"
This works because the Enumerator specifically came from #cycle
. Watch what happens when a regular #each
Enumerator is used.
>> each_enum = arr.each
>> each_enum.next
=> "first"
>> each_enum.next
=> "middle"
>> each_enum.next
=> "last"
>> each_enum.next
=> StopIteration: iteration reached an end
Note: You can’t just use #next
on #cycle
and #each
.
>> arr.cycle.next
=> "first"
>> arr.cycle.next
=> "first"
>> arr.cycle.next
=> "first"
This is because iterator methods return a fresh Enumerator every time.
>> arr.cycle.object_id == arr.cycle.object_id
=> false
A good way to describe Enumerator objects is that they contain the information about how to iterate through a collection. For example, a #cycle
Enumerator knows how to iterate through a collection one or more times, while a #reverse_each
Enumerator knows how to iterate through a collection backwards.
How is this useful?
Well, let’s say you want to cycle through a collection backwards. You would just use #reverse_cycle
, right?
>> [:first, :middle, :last].reverse_cycle(2) { |i| puts i }
NoMethodError: undefined method `reverse_cycle'...
Crap! There’s no #reverse_cycle
in Enumerable! We told Boss Lady that we would be cycling backwards by this afternoon. And with the economy and all…
But wait. Perhaps not all hope is lost. What about taking #reverse_each
…and then calling #cycle
on that?
>> [:first, :middle, :last].reverse_each.cycle(2) { |i| puts i }
last
middle
first
last
middle
first
Chaining: that’s what you can do with Enumerator. Want to cycle through a collection backwards and place the results in an Array? Just add #map
to the chain:
>> [:first, :middle, :last].reverse_each.cycle(2).map { |i| i }
=> [:last, :middle, :first, :last, :middle, :first]
Conclusion
That’s covers Enumerator and Enumerable. In the next (and final) installment in my series on collections, I’ll cover some of my favorite tips and tricks.