A Guide to Ruby Collections III: Enumerable and Enumerator

This entry is part 3 of 4 in the series A Guide to Ruby Collections

A Guide to Ruby Collections

collections_enum

In this article, I’m going to cover the Enumerable module and the Enumerator class. In order to get maximum use out of Ruby’s collections, you will need to understand how they work and what they give you. In particular, Enumerable by itself contributes heavily towards Ruby’s terseness and flexibility. In fact, many new Rubyists use it under the hood without even knowing it.

Enumerable

Enumerable is possible in Ruby due to the notion of “mixins.” In most languages, code sharing is only possible through inheritance, and inheritance is generally limited to one parent class. In addition to classes, Ruby has modules which are containers for methods and constants. If you want to write code once and share it between a bunch of unrelated classes, you can put it in modules to be “mixed in” with your classes. Note that this is different from sharing an interface, like in Java, where only method signatures are shared.

It’s possible to find out which classes use Enumerable with Object#included_modules.

>> Array.included_modules
=> [Enumerable, PP::ObjectMixin, Kernel] 
>> String.included_modules
=> [Comparable, PP::ObjectMixin, Kernel]

In order for a class to use Enumerable, it must define an #each method. This method should yield each item in the collection to a block. Remember the Colors class we made earlier? Let’s add Enumerable to it to give it magic iteration powers.

>> class Colors
>>   include Enumerable

>>   def each
>>     yield "red"
>>     yield "green"
>>     yield "blue"
>>   end
>> end

>> c = Colors.new
>> c.map { |i| i.reverse }
=> ["der", "neerg", "eulb"]

It is possible to see what all methods are provided by Enumerable by checking the output of the #instance_methods method.

>> Enumerable.instance_methods.sort
=> [:all?, :any?, :chunk, :collect, :collect_concat, :count, :cycle,
    :detect, :drop, :drop_while, :each_cons, :each_entry, :each_slice,
    :each_with_index, :each_with_object, :entries, :find, :find_all, 
    :find_index, :first, :flat_map, :grep, :group_by, :include?, :inject, 
    :map, :max, :max_by, :member?, :min, :min_by, :minmax, :minmax_by, 
    :none?, :one?, :partition, :reduce, :reject, :reverse_each, :select, 
    :slice_before, :sort, :sort_by, :take, :take_while, :to_a, :zip]

Searching

Enumerable provides several functions for filtering your collections. Ones you will probably see often are “ect” family of methods.

>> [1,2,3,4,5].select { |i| i > 3 }
=> [4,5]

>> [1,2,3,4,5].detect { |i| i > 3 }
=> 4

>> [1,2,3,4,5].reject { |i| i > 3 }
=> [1,2,3]

The #find_all and #find methods perform the same operations.

Enumerable#grep provides the ability to perform a general search. It’s a shortcut for #select and the === operator.

Threequals (===) is a strange but rather useful operator. It isn’t used for establishing equality in an absolute sense, but instead a more general one.

>> (1..3) === 2
=> true

>> (1..3) === 4
=> false

>> ('a'..'c') === 'b'
=> true

>> /el/ === "Hello World"
=> true

>> Object === Array
=> true

When using ===, the more general object (i.e. Range, Regexp) goes on the left side of the operator, and the specific object goes on the right. It does not work the other way around,because threequals is generally overwritten one-way. Range#=== knows what to do with Fixnums, but Fixnum#=== doesn’t know how to handle Ranges.

>> 2 === (1..3)
=> false

If you wanted to do a very general search with select, you could use threequals.

>> [:hello_world, "Jello World", 3].select { |i| /.ello..orld/ === i }
=> [:hello_world, "Jello World"]

Enumerable#grep is a wrapper for this.

>> [6, 14, 28, 47, 12].grep(5..15)
=> [6, 14, 12]

>> [0.3, "three", :three, "thirty-three"].grep /three/
=> ["three", :three, "thirty-three"]

Look at what would happen if you tried a #map operation on a heterogeneous Array.

>> ['a', 1, 2, 'b'].map(&:upcase)
=> NoMethodError: undefined method `upcase' for 1:Fixnum

We can use #grep to filter the collection for elements that will accept our method.

>> ['a', 1, 2, 'b'].grep(String, &:upcase)
=> ["A", "B"]

Sorting

Let’s say you have an array of numbers represented as both integers and strings. If you attempt to use #sort, it will fail.

>> [1, "5", 2, "7", "3"].sort
ArgumentError: comparison of Fixnum with String failed

Since strings are not guaranteed to be numbers with quotes around them, Ruby does not try to implicitly convert the strings into suitable numbers before trying to sort. We can fix this by telling how to sort the elements.

>> [1, "5", 2, "7", "3"].sort_by { |a| a.to_i }
=> [1, 2, "3", "5", "7"]

You can also use the slightly shorter version.

>> [1, "5", 2, "7", "3"].sort_by(&:to_i)
=> [1, 2, "3", "5", "7"]

Now, what if you are sorting something that isn’t a number or isn’t easily converted into one? The #sort method is not magic. It checks the result of the combined comparison operator method (#<=>).

The #<=> method works like this:

>> 1 <=> 2
=> -1

>> 2 <=> 1
=> 1

>> 1 <=> 1
=> 0

If you want to make an arbitrary class sortable, you just need to define it. Go ahead and make a new class and see what happens when you try to sort.

>> class McGuffin
>>   attr_accessor :absurdity 
>>   def initialize(a)
>>     @absurdity = a
>>   end
>> end

>> m1 = McGuffin.new(0.1)
>> m2 = McGuffin.new(2.3)
>> [m1, m2].sort
=> ArgumentError: comparison of McGuffin with McGuffin failed

It produces an error because Ruby does not know how to compare members of this new class. This is easily fixed by defining #<=>.

>> class McGuffin
>>   def <=>(other)
>>     @absurdity <=> other.absurdity
>>   end
>> end

>> [m1, m2].sort
=> [#>McGuffin:0x00000000de0238 @absurdity=0.1>, 
    #>McGuffin:0x00000000de50f8 @absurdity=2.3>]

Note that #<=> is not a substitute for the other comparison operators. You will need to define those separately, if you want them.

>> m1 > m2
=> NoMethodError: undefined method `>'...

Any? and All?

Enumerable#any? returns true if its block is true for any element in the collection. Enumerable#all? returns true if its block is true for every element.

>> [2,4,6,8].all?(&:even?)
=> true 

>> [2,4,5,8].any? { |i| i % 2 == 0 }
=> true

>> [2,4,5,8].all? { |i| i % 2 == 0 }
=> false

Who Needs Excel?

Enumerable#reduce takes a collection and reduces it down to a single element. It applies an operation to each element, maintaining a running “total.”

For example, #reduce can be used to sum a collection.

>> [1,2,3].reduce(:+)
=> 6

Ruby does not ship with a factorial method. However, thanks to #reduce it’s easy to slap together a beautiful hack.

>> class Integer
>>   def factorial
>>     (1..self).reduce(:*) || 1
>>   end
>> end

>> 6.factorial
=> 720
>> 0.factorial
=> 1

Enumerable#reduce is notoriously hard to understand. So far, I have kept things simple by letting #reduce‘s accumulator operate out of sight. But now let’s bring it out into the open.

>> [1,2,3].reduce(0) { |sum, element| sum + element }
=> 6

Whoa, whoa, whoa. What is all that stuff I just added? What does the argument passed to #reduce mean?

Compare these three #reduce calls.

[1,2,3].reduce do |accumulator, current|
  puts "Accumulator: #{accumulator}, Current: #{current}"
  accumulator + current
end

Accumulator: 1, Current: 2
Accumulator: 3, Current: 3

=> 6

[1,2,3].reduce(0) do |accumulator, current|
  puts "Accumulator: #{accumulator}, Current: #{current}"
  accumulator + current
end

Accumulator: 0, Current: 1
Accumulator: 1, Current: 2
Accumulator: 3, Current: 3

=> 6

[1,2,3].reduce(1) do |accumulator, current|
  puts "Accumulator: #{accumulator}, Current: #{current}"
  accumulator + current
end

Accumulator: 1, Current: 1
Accumulator: 2, Current: 2
Accumulator: 4, Current: 3

=> 7

I think what is most confusing in this case is the difference between passing 0 and nothing at all.

  • #reduce – Current starts out as second element. Accumulator starts out as first element.

  • #reduce(x) – Current starts out as first element. Accumulator starts out as x.

How is this even useful? The example that ruby-doc.org uses is finding the longest word in a collection.

>> words = %w{cool bobsled Jamaican}
>> longest = words.reduce do |memo, word|
>>   memo.length > word.length ? memo : word
>> end

=> "Jamaican"

Originally, #reduce was known as #inject, and you can use either one in modern Ruby. However, I personally prefer #reduce because I find the idea of “injecting” a collection down to a single element confusing.

Infinite Iteration

It’s often useful to iterate through a collection an arbitrary number of times, or even infinitely. A naive way to do this would be to keep track of a counter index and reset it every time it hits the size of the collection – 1 (when collections are zero-indexed like in Ruby).

A better solution is to use an incrementer and mod (%) the size of the collection to get each index. Let’s say you have 3 product id’s, and you need to perform 10 iteration steps.

>> arr = ["first", "middle", "last"]
>> 10.times { |i| puts arr[i % arr.size] }
first
middle
last
first
middle
last
first
middle
last
first

Ruby provides a slightly cleaner way of doing this with Enumerable#cycle.

>> arr.cycle(2) { |i| puts i }
first
middle
last
first
middle
last

Passing an argument to cycle will completely iterate through the collection that many times. If no argument is passed, it iterates infinitely (producing an infinite loop).

There are a couple of problems with cycling this way:

  • The argument to #cycle specifies the number of times to cycle, not the number of elements to cycle through
  • If you want to cycle infinitely, all of your relevant logic must go inside #cycle‘s block, because the code will never leave

Thankfully, both of these problems can be solved by using an Enumerator object.

Enumerator

If #cycle is not passed a block, it will return an Enumerator.

>> cycle_enum = arr.cycle
=> #>Enumerator: ["first", "middle", "last"]:cycle>

Now, Enumerator#next can be used to retrieve the next element as many times as necessary.

>> cycle_enum.next
=> "first" 
>> cycle_enum.next
=> "middle" 
>> cycle_enum.next
=> "last" 
>> cycle_enum.next
=> "first"

This works because the Enumerator specifically came from #cycle. Watch what happens when a regular #each Enumerator is used.

>> each_enum = arr.each
>> each_enum.next
=> "first"
>> each_enum.next
=> "middle"
>> each_enum.next
=> "last"
>> each_enum.next
=> StopIteration: iteration reached an end

Note: You can’t just use #next on #cycle and #each.

>> arr.cycle.next
=> "first"
>> arr.cycle.next
=> "first"
>> arr.cycle.next
=> "first"

This is because iterator methods return a fresh Enumerator every time.

>> arr.cycle.object_id == arr.cycle.object_id
=> false

A good way to describe Enumerator objects is that they contain the information about how to iterate through a collection. For example, a #cycle Enumerator knows how to iterate through a collection one or more times, while a #reverse_each Enumerator knows how to iterate through a collection backwards.

How is this useful?

Well, let’s say you want to cycle through a collection backwards. You would just use #reverse_cycle, right?

>> [:first, :middle, :last].reverse_cycle(2) { |i| puts i }
NoMethodError: undefined method `reverse_cycle'...

Crap! There’s no #reverse_cycle in Enumerable! We told Boss Lady that we would be cycling backwards by this afternoon. And with the economy and all…

But wait. Perhaps not all hope is lost. What about taking #reverse_each…and then calling #cycle on that?

>> [:first, :middle, :last].reverse_each.cycle(2) { |i| puts i } 
last
middle
first
last
middle
first

Chaining: that’s what you can do with Enumerator. Want to cycle through a collection backwards and place the results in an Array? Just add #map to the chain:

>> [:first, :middle, :last].reverse_each.cycle(2).map { |i| i }
=> [:last, :middle, :first, :last, :middle, :first]

Conclusion

That’s covers Enumerator and Enumerable. In the next (and final) installment in my series on collections, I’ll cover some of my favorite tips and tricks.

A Guide to Ruby Collections

<< A Guide to Ruby Collections, II: Hashes, Sets, and RangesA Guide to Ruby Collections IV: Tips and Tricks >>

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • Joe

    One quick note, if you’ve defined < => you can just include Comparable to get all the other comparison operators…