A Guide to Ruby Collections III: Enumerable and Enumerator

collections_enum

In this article, I’m going to cover the Enumerable module and the Enumerator class. In order to get maximum use out of Ruby’s collections, you will need to understand how they work and what they give you. In particular, Enumerable by itself contributes heavily towards Ruby’s terseness and flexibility. In fact, many new Rubyists use it under the hood without even knowing it.

Key Takeaways

Enumerable Module Overview: Ruby’s Enumerable module uses mixins to enhance classes with collection capabilities, requiring only an `#each` method to yield elements to a block, thereby providing a vast array of methods for traversal, searching, and sorting.
Power of Enumerator: The Enumerator class in Ruby facilitates the creation of external iterators, allowing for controlled traversal over collections without modifying the original object structure.
Method Highlights: Methods like `#map`, `#select`, `#reduce`, and others under Enumerable provide powerful tools for data manipulation and querying, enabling concise and expressive collection operations.
Infinite Iteration Techniques: Ruby supports infinite collection iteration through the `#cycle` method in Enumerable, which can be controlled via an Enumerator for more complex iteration patterns without altering the collection.
Practical Use Cases and Examples: The guide provides practical examples, such as using `#sort_by` for sorting heterogeneous arrays or `#reduce` for aggregating values, illustrating the flexibility and utility of Enumerable and Enumerator in Ruby.

Enumerable

Enumerable is possible in Ruby due to the notion of “mixins.” In most languages, code sharing is only possible through inheritance, and inheritance is generally limited to one parent class. In addition to classes, Ruby has modules which are containers for methods and constants. If you want to write code once and share it between a bunch of unrelated classes, you can put it in modules to be “mixed in” with your classes. Note that this is different from sharing an interface, like in Java, where only method signatures are shared.

It’s possible to find out which classes use Enumerable with Object#included_modules.



>> Array.included_modules

=> [Enumerable, PP::ObjectMixin, Kernel]

>> String.included_modules

=> [Comparable, PP::ObjectMixin, Kernel]

In order for a class to use Enumerable, it must define an #each method. This method should yield each item in the collection to a block. Remember the Colors class we made earlier? Let’s add Enumerable to it to give it magic iteration powers.



>> class Colors

>>   include Enumerable
>>   def each

>>     yield "red"

>>     yield "green"

>>     yield "blue"

>>   end

>> end
>> c = Colors.new

>> c.map { |i| i.reverse }

=> ["der", "neerg", "eulb"]

It is possible to see what all methods are provided by Enumerable by checking the output of the #instance_methods method.



>> Enumerable.instance_methods.sort

=> [:all?, :any?, :chunk, :collect, :collect_concat, :count, :cycle,

    :detect, :drop, :drop_while, :each_cons, :each_entry, :each_slice,

    :each_with_index, :each_with_object, :entries, :find, :find_all,

    :find_index, :first, :flat_map, :grep, :group_by, :include?, :inject,

    :map, :max, :max_by, :member?, :min, :min_by, :minmax, :minmax_by,

    :none?, :one?, :partition, :reduce, :reject, :reverse_each, :select,

    :slice_before, :sort, :sort_by, :take, :take_while, :to_a, :zip]

Searching

Enumerable provides several functions for filtering your collections. Ones you will probably see often are “ect” family of methods.



>> [1,2,3,4,5].select { |i| i > 3 }

=> [4,5]
>> [1,2,3,4,5].detect { |i| i > 3 }

=> 4
>> [1,2,3,4,5].reject { |i| i > 3 }

=> [1,2,3]

The #find_all and #find methods perform the same operations.

Enumerable#grep provides the ability to perform a general search. It’s a shortcut for #select and the === operator.

Threequals (===) is a strange but rather useful operator. It isn’t used for establishing equality in an absolute sense, but instead a more general one.



>> (1..3) === 2

=> true
>> (1..3) === 4

=> false
>> ('a'..'c') === 'b'

=> true
>> /el/ === "Hello World"

=> true
>> Object === Array

=> true

When using ===, the more general object (i.e. Range, Regexp) goes on the left side of the operator, and the specific object goes on the right. It does not work the other way around,because threequals is generally overwritten one-way. Range#=== knows what to do with Fixnums, but Fixnum#=== doesn’t know how to handle Ranges.



>> 2 === (1..3)

=> false

If you wanted to do a very general search with select, you could use threequals.



>> [:hello_world, "Jello World", 3].select { |i| /.ello..orld/ === i }

=> [:hello_world, "Jello World"]

Enumerable#grep is a wrapper for this.



>> [6, 14, 28, 47, 12].grep(5..15)

=> [6, 14, 12]
>> [0.3, "three", :three, "thirty-three"].grep /three/

=> ["three", :three, "thirty-three"]

Look at what would happen if you tried a #map operation on a heterogeneous Array.



>> ['a', 1, 2, 'b'].map(&:upcase)

=> NoMethodError: undefined method `upcase' for 1:Fixnum

We can use #grep to filter the collection for elements that will accept our method.



>> ['a', 1, 2, 'b'].grep(String, &:upcase)

=> ["A", "B"]

Sorting

Let’s say you have an array of numbers represented as both integers and strings. If you attempt to use #sort, it will fail.



>> [1, "5", 2, "7", "3"].sort

ArgumentError: comparison of Fixnum with String failed

Since strings are not guaranteed to be numbers with quotes around them, Ruby does not try to implicitly convert the strings into suitable numbers before trying to sort. We can fix this by telling how to sort the elements.



>> [1, "5", 2, "7", "3"].sort_by { |a| a.to_i }

=> [1, 2, "3", "5", "7"]

You can also use the slightly shorter version.



>> [1, "5", 2, "7", "3"].sort_by(&:to_i)

=> [1, 2, "3", "5", "7"]

Now, what if you are sorting something that isn’t a number or isn’t easily converted into one? The #sort method is not magic. It checks the result of the combined comparison operator method (#<=>).

The #<=> method works like this:



>> 1 <=> 2

=> -1
>> 2 <=> 1

=> 1
>> 1 <=> 1

=> 0

If you want to make an arbitrary class sortable, you just need to define it. Go ahead and make a new class and see what happens when you try to sort.



>> class McGuffin

>>   attr_accessor :absurdity

>>   def initialize(a)

>>     @absurdity = a

>>   end

>> end
>> m1 = McGuffin.new(0.1)

>> m2 = McGuffin.new(2.3)

>> [m1, m2].sort

=> ArgumentError: comparison of McGuffin with McGuffin failed

It produces an error because Ruby does not know how to compare members of this new class. This is easily fixed by defining #<=>.



>> class McGuffin

>>   def <=>(other)

>>     @absurdity <=> other.absurdity

>>   end

>> end
>> [m1, m2].sort

=> [#>McGuffin:0x00000000de0238 @absurdity=0.1>,

    #>McGuffin:0x00000000de50f8 @absurdity=2.3>]

Note that #<=> is not a substitute for the other comparison operators. You will need to define those separately, if you want them.



>> m1 > m2

=> NoMethodError: undefined method `>'...

Any? and All?

Enumerable#any? returns true if its block is true for any element in the collection. Enumerable#all? returns true if its block is true for every element.



>> [2,4,6,8].all?(&:even?)

=> true 
>> [2,4,5,8].any? { |i| i % 2 == 0 }

=> true
>> [2,4,5,8].all? { |i| i % 2 == 0 }

=> false

Who Needs Excel?

Enumerable#reduce takes a collection and reduces it down to a single element. It applies an operation to each element, maintaining a running “total.”

For example, #reduce can be used to sum a collection.



>> [1,2,3].reduce(:+)

=> 6

Ruby does not ship with a factorial method. However, thanks to #reduce it’s easy to slap together a beautiful hack.



>> class Integer

>>   def factorial

>>     (1..self).reduce(:*) || 1

>>   end

>> end
>> 6.factorial

=> 720

>> 0.factorial

=> 1

Enumerable#reduce is notoriously hard to understand. So far, I have kept things simple by letting #reduce‘s accumulator operate out of sight. But now let’s bring it out into the open.



>> [1,2,3].reduce(0) { |sum, element| sum + element }

=> 6

Whoa, whoa, whoa. What is all that stuff I just added? What does the argument passed to #reduce mean?

Compare these three #reduce calls.



[1,2,3].reduce do |accumulator, current|

  puts "Accumulator: #{accumulator}, Current: #{current}"

  accumulator + current

end
Accumulator: 1, Current: 2

Accumulator: 3, Current: 3
=> 6
[1,2,3].reduce(0) do |accumulator, current|

  puts "Accumulator: #{accumulator}, Current: #{current}"

  accumulator + current

end
Accumulator: 0, Current: 1

Accumulator: 1, Current: 2

Accumulator: 3, Current: 3
=> 6
[1,2,3].reduce(1) do |accumulator, current|

  puts "Accumulator: #{accumulator}, Current: #{current}"

  accumulator + current

end
Accumulator: 1, Current: 1

Accumulator: 2, Current: 2

Accumulator: 4, Current: 3
=> 7

I think what is most confusing in this case is the difference between passing 0 and nothing at all.

#reduce – Current starts out as second element. Accumulator starts out as first element.
#reduce(x) – Current starts out as first element. Accumulator starts out as x.

How is this even useful? The example that ruby-doc.org uses is finding the longest word in a collection.



>> words = %w{cool bobsled Jamaican}

>> longest = words.reduce do |memo, word|

>>   memo.length > word.length ? memo : word

>> end
=> "Jamaican"

Originally, #reduce was known as #inject, and you can use either one in modern Ruby. However, I personally prefer #reduce because I find the idea of “injecting” a collection down to a single element confusing.

Infinite Iteration

It’s often useful to iterate through a collection an arbitrary number of times, or even infinitely. A naive way to do this would be to keep track of a counter index and reset it every time it hits the size of the collection – 1 (when collections are zero-indexed like in Ruby).

A better solution is to use an incrementer and mod (%) the size of the collection to get each index. Let’s say you have 3 product id’s, and you need to perform 10 iteration steps.



>> arr = ["first", "middle", "last"]

>> 10.times { |i| puts arr[i % arr.size] }

first

middle

last

first

middle

last

first

middle

last

first

Ruby provides a slightly cleaner way of doing this with Enumerable#cycle.



>> arr.cycle(2) { |i| puts i }

first

middle

last

first

middle

last

Passing an argument to cycle will completely iterate through the collection that many times. If no argument is passed, it iterates infinitely (producing an infinite loop).

There are a couple of problems with cycling this way:

The argument to #cycle specifies the number of times to cycle, not the number of elements to cycle through
If you want to cycle infinitely, all of your relevant logic must go inside #cycle‘s block, because the code will never leave

Thankfully, both of these problems can be solved by using an Enumerator object.

Enumerator

If #cycle is not passed a block, it will return an Enumerator.



>> cycle_enum = arr.cycle

=> #>Enumerator: ["first", "middle", "last"]:cycle>

Now, Enumerator#next can be used to retrieve the next element as many times as necessary.



>> cycle_enum.next

=> "first"

>> cycle_enum.next

=> "middle"

>> cycle_enum.next

=> "last"

>> cycle_enum.next

=> "first"

This works because the Enumerator specifically came from #cycle. Watch what happens when a regular #each Enumerator is used.



>> each_enum = arr.each

>> each_enum.next

=> "first"

>> each_enum.next

=> "middle"

>> each_enum.next

=> "last"

>> each_enum.next

=> StopIteration: iteration reached an end

Note: You can’t just use #next on #cycle and #each.



>> arr.cycle.next

=> "first"

>> arr.cycle.next

=> "first"

>> arr.cycle.next

=> "first"

This is because iterator methods return a fresh Enumerator every time.



>> arr.cycle.object_id == arr.cycle.object_id

=> false

A good way to describe Enumerator objects is that they contain the information about how to iterate through a collection. For example, a #cycle Enumerator knows how to iterate through a collection one or more times, while a #reverse_each Enumerator knows how to iterate through a collection backwards.

How is this useful?

Well, let’s say you want to cycle through a collection backwards. You would just use #reverse_cycle, right?



>> [:first, :middle, :last].reverse_cycle(2) { |i| puts i }

NoMethodError: undefined method `reverse_cycle'...

Crap! There’s no #reverse_cycle in Enumerable! We told Boss Lady that we would be cycling backwards by this afternoon. And with the economy and all…

But wait. Perhaps not all hope is lost. What about taking #reverse_each…and then calling #cycle on that?



>> [:first, :middle, :last].reverse_each.cycle(2) { |i| puts i }

last

middle

first

last

middle

first

Chaining: that’s what you can do with Enumerator. Want to cycle through a collection backwards and place the results in an Array? Just add #map to the chain:



>> [:first, :middle, :last].reverse_each.cycle(2).map { |i| i }

=> [:last, :middle, :first, :last, :middle, :first]

Conclusion

That’s covers Enumerator and Enumerable. In the next (and final) installment in my series on collections, I’ll cover some of my favorite tips and tricks.

Frequently Asked Questions (FAQs) about Ruby Collections III: Enumerable and Enumerator

What is the difference between Enumerable and Enumerator in Ruby?

Enumerable and Enumerator are both modules in Ruby that provide methods for traversing, searching, and sorting collections. Enumerable is a mixin that provides collection classes with several traversal and searching methods, and the ability to sort. It relies on the class having a method called each which yields successive members of the collection. Enumerator, on the other hand, is a way to create an external iterator. It can be used to iterate over a collection without exposing the underlying object.

How do I use the Enumerable module in Ruby?

To use the Enumerable module in Ruby, you need to include it in your class and define an each method. The each method should yield each element of the collection in turn. Once you’ve done this, you can use any of the Enumerable methods on instances of your class.

Can you provide an example of using the Enumerator module in Ruby?

Sure, here’s an example of using the Enumerator module. Let’s say you have an array of numbers and you want to create an enumerator that yields each number squared:

numbers = [1, 2, 3, 4, 5]
squares = Enumerator.new do |yielder|
numbers.each do |number|
yielder.yield number ** 2
end
end
squares.each { |square| puts square }

This will output the squares of each number in the array.

What is the purpose of the Enumerable#map method in Ruby?

The Enumerable#map method is used to transform each element in a collection. It returns a new array containing the results of the transformation. For example, if you have an array of numbers and you want to square each number, you could use the map method like this:

numbers = [1, 2, 3, 4, 5]
squares = numbers.map { |number| number ** 2 }

This will return a new array containing the squares of each number.

How does the Enumerable#reduce method work in Ruby?

The Enumerable#reduce method is used to combine all elements in a collection by applying a binary operation. It takes an initial value and a block. The block is called for each element in the collection, and the result of the block is used as the initial value for the next iteration. For example, if you want to sum all the numbers in an array, you could use the reduce method like this:

numbers = [1, 2, 3, 4, 5]
sum = numbers.reduce(0) { |total, number| total + number }

This will return the sum of all the numbers in the array.

What is the difference between Enumerable#select and Enumerable#reject in Ruby?

Enumerable#select and Enumerable#reject are both methods used to filter elements in a collection. The select method returns a new array containing all elements for which the block returns a truthy value. The reject method, on the other hand, returns a new array containing all elements for which the block returns a falsy value. For example, if you have an array of numbers and you want to select only the even numbers, you could use the select method like this:

numbers = [1, 2, 3, 4, 5]
evens = numbers.select { |number| number.even? }

And if you want to reject the even numbers, you could use the reject method like this:

numbers = [1, 2, 3, 4, 5]
odds = numbers.reject { |number| number.even? }

How do I sort a collection using the Enumerable#sort method in Ruby?

The Enumerable#sort method is used to sort a collection. It returns a new array containing the elements of the original collection in sorted order. By default, it sorts in ascending order, but you can provide a block to specify a different order. For example, if you have an array of numbers and you want to sort them in descending order, you could use the sort method like this:

numbers = [5, 3, 2, 1, 4]
sorted = numbers.sort { |a, b| b <=> a }

This will return a new array containing the numbers in descending order.

What is the purpose of the Enumerable#count method in Ruby?

The Enumerable#count method is used to count the number of elements in a collection. If no argument or block is given, it returns the number of elements in the collection. If an argument is given, it returns the number of elements equal to the argument. If a block is given, it returns the number of elements for which the block returns a truthy value. For example, if you have an array of numbers and you want to count the number of even numbers, you could use the count method like this:

numbers = [1, 2, 3, 4, 5]
evens_count = numbers.count { |number| number.even? }

This will return the number of even numbers in the array.

How do I use the Enumerable#find method in Ruby?

The Enumerable#find method is used to find the first element in a collection for which the block returns a truthy value. If no such element is found, it returns nil. For example, if you have an array of numbers and you want to find the first even number, you could use the find method like this:

numbers = [1, 2, 3, 4, 5]
first_even = numbers.find { |number| number.even? }

This will return the first even number in the array.

What is the difference between Enumerable#each and Enumerable#each_with_index in Ruby?

Enumerable#each and Enumerable#each_with_index are both methods used to iterate over a collection. The each method yields each element in the collection to the block. The each_with_index method, on the other hand, yields each element along with its index to the block. For example, if you have an array of numbers and you want to print each number along with its index, you could use the each_with_index method like this:

numbers = [1, 2, 3, 4, 5]
numbers.each_with_index { |number, index| puts "#{index}: #{number}" }

This will print each number along with its index.