Ruby
Article
By Fred Heath

Algorithmic Fun with Ruby Hashes

By Fred Heath

Futuristic Technology

All programmers use them. You might call them Maps, Dictionaries, Hash Maps, or Associative Arrays. In the Ruby world, we call them Hashes and they’re pretty awesome. A Hash is, simply put, a collection of key-value pairs. It is similar to an Array, except that it is indexed via arbitrary keys of any object type instead of an incremental integer. If you take a look at the Hash API, you’ll see that it offers an impressive range of functionality. In this article, we won’t be looking at the Hash API as such, but rather at using Hashes to model, abstract, and solve common logic problems.

Truth Tables and Capturing the Algorithm

Many people use Hashes purely as simple data lookup tables or as named parameters to methods. This is all fine, but we can do much more by leveraging Hash’s inherent flexibility and powerful features to implement logic and complex algorithms. Let’s look at an interviewer’s favorite, the FizzBuzz test. It goes like this:

“Write a program that prints the numbers from 1 to 100. For multiples of 3, print “Fizz” instead of the number and for multiples of 5, print “Buzz”. For numbers which are multiples of both 3 and 5, print “FizzBuzz”.”

We have two conditions here which affect the outcome: (n % 3 == 0) and (n % 5 == 0). As soon as we grok that, solving the problem is straightforward:

def conditional(n)
  if (n % 3 == 0) && (n % 5 != 0)
    'Fizz'
  elsif (n % 3 != 0) && (n % 5 == 0)
    'Buzz'
  elsif (n % 3 == 0) && (n % 5 == 0)
    'FizzBuzz'
  else n
  end
end

To run FizzBuzz for a sequence of 1 to 100 we do:

(1..100).each {|n| puts conditional(n)}

That was a piece of cake! But where do Hashes come into it? Well, we can capture this conditional logic as a Hash. To get a clue on how to do that, write down the truth table for FizzBuzz:

(n % 3 == 0) (n % 5 == 0) outcome
true false ‘Fizz’
false true ‘Buzz’
true true ‘FizzBuzz’
false false number

Observe that the outcome column consists of discrete, non-nil values for any given input. That would make a good key for our Hash. Let’s rewrite our truth table to suit a Hash:

Key Value
‘Fizz’ (n % 3 == 0) && (n % 5 != 0)
‘Buzz’ (n % 3 != 0) && (n % 5 == 0)
‘FizzBuzz’ (n % 3 == 0) && (n % 5 == 0)
number (n % 3 != 0) && (n % 5 != 0)

Look Ma, No If Statements!

Now, it’s easy to model our Hash on the truth table:

def declarative(n)
  h = {
    'Fizz' => (n % 3 == 0) && (n % 5 != 0),
    'Buzz' => (n % 3 != 0) && (n % 5 == 0),
    'FizzBuzz' => (n % 3 == 0) && (n % 5 == 0),
    n => (n % 3 != 0) && (n % 5 != 0)
  }
  h.key(true)
end

Here, we’re using a Hash to implement some declarative-style programming. We’re not telling our app how to calculate the outcome for each number, we just declare what conditions determine the keys and let the data structure do the lifting. Here’s the sweet thing: as the two conditions used to determine the keys are mutually exclusive, there will only be one true value in the Hash for any number input. This means that the correct key for the input will have the value true and all others will be false, so the method returns the only key with a value of true for the given input.

puts declarative(2) #=> 2
puts declarative(3) #=> Fizz
puts declarative(5) #=> Buzz
puts declarative(7) #=> 7
puts declarative(15)#=> FizzBuzz

To run FizzBuzz declaratively for a sequence of 1 to 100:

(1..100).each {|n| puts declarative(n)}

I don’t know about you, but this way appeals to me much more than the multi-conditional approach. I find it more elegant and tidy than the if..elsif way. Unfortunately, elegance comes at a price. As you can see from the benchmarks, the declarative hash approach is much, much slower than the imperative conditional statement.

Can we reach a happy compromise between elegance and performance? Why, yes we can!

Memoization

Memoization is simply the technique of storing computed or retrieved data for future use. Also known as caching to you and I. Hashes are pretty good for storing things. Let’s have another go at FizzBuzz, memoization-style:

memoization ||= Hash.new do |hash,key|
  hash[key] = case
  when (key % 3 == 0) && (key % 5 != 0)
    'Fizz'
  when (key % 3 != 0) && (key % 5 == 0)
    'Buzz'
  when (key % 3 == 0) && (key % 5 == 0)
    'FizzBuzz'
  else key
  end
end

We are creating a Hash that is initialized to the right FizzBuzz value for each numerical key passed to it. Run it for a range up to 100:

(1..100).each  { |n| memoization[n] }

The memoization hash has been created and has mapped each key from 1 to 100 to its FizzBuzz value. Next time we want the FizzBuzz value of a number in this range, a simple Hash#fetch will provide it.

Memoization has some very practical applications, other than solving FizzBuzz-style problems. We all use the web, so here’s how we could implement some raw and rugged response caching using a Hash:

require 'net/http'
http = Hash.new{|hash,key| hash[key] = Net::HTTP.get_response(URI(key)).body }
puts http['http://www.google.com'] # make actual request
puts http['http://www.google.com'] # get cached value

Now, let’s take a look at performance. Here are the benchmarks (on my machine – yours will obviously differ) when running each of our FizzBuzz solutions twice for a range of 1..1000000.

user system total real
conditional 0.230000 0.000000 0.230000
conditional 0.240000 0.000000 0.240000
declarative 0.990000 0.000000 0.990000
declarative 0.980000 0.000000 0.980000
memoization 1.040000 0.020000 1.060000
memoization 0.370000 0.000000 0.370000

The first time the memoization code is run, it’s very slow. Ruby has to build up the Hash, as well as execute the conditional logic. But the 2nd time, and each time thereafter, its execution is dramatically quicker. That’s because the Hash already contains all values for that range, so it just has to look them up whenever we ask it instead of computing them all over again.

You may notice that, although quick for subsequent runs, the memoization technique is still not quite as fast as the if..elsif approach. This is because we don’t get the full benefit of caching by applying it to simple, linear algorithms like FizzBuzz. To make the most of memoization, we need to use it on more complex, polynomial or exponential-order algorithmic logic.

--ADVERTISEMENT--

Recursive Hashes

Recursion is a typical domain for fast growth algorithms. Let’s look at a classic example of recursion: the Factorial problem. The usual recursive implementation goes like this:

def factorial(x)
  x == 1 ? 1 : x * factorial(x-1)
end

We can memoize it with a hash:

factorial_hash ||= Hash.new do |hash,key|
  hash[key] =  key * hash[key-1]
end
# initialize special cases
fib_hash[1] = 1; fib_hash[2] = 2

Here, we are initializing a Hash in such a way as to force the initialization code to be executed n times for an input of n. The first time we run, say, factorial_hash[5] the code will run 5 times, creating a Hash for the keys 2-5. Then, each time we need to get the factorial value of a number up to 5, the Hash will just do a simple lookup and fetch the right value.

Run each technique twice to get the factorial of 3000:

factorial(3000) 
factorial(3000) 
factorial_hash[3000]  
factorial_hash[3000]

Here are the timings:

user real
factorial 0.005829
factorial 0.006082
factorial_hash 0.011202
factorial_hash 0.000002

As before, the first time the hash runs takes longer than the non-hashed equivalent, but subsequent runs are now blazingly fast.

Let’s try something even more complex: the Fibonacci sequence. Here’s the standard approach:

def fibonacci(n)
  n <= 2 ? 1 : fibonacci(n - 1) + fibonacci(n - 2)
end

and here’s our caching Hash approach:

fib_hash ||= Hash.new do |hash,key|
  hash[key] = hash[key-1] + hash[key-2]
end
# initialize special cases
fib_hash[1] = 1; fib_hash[2] = 1

Running each approach twice to get the Fibonacci value at 35 gives the following timings:

fibonacci(35) 
fibonacci(35) 
fib_hash[35]  
fib_hash[35]
user real
fibonacci 0.987518
fibonacci 0.981804
fibonacci_hash 0.000041
fibonacci_hash 0.000002

Now even the first run of the Hash approach is significantly faster than the non-cached recursive method. Subsequent runs put the non-cached method to shame! The more complex the problem, the greater the benefits obtained with memoization.

But every silver lining has a cloud. Using Hashes recursively adds more data onto our app’s Stack Frame, which means that recursive Hash approach has much lower threshold for Stack Overflow type errors (that’s the error, not the website). So, for algorithms with a high range of input values, recursive Hashes are probably not the best solution.

Conclusion

Hashes are more than just a way to pass options to methods. Used properly, they can help make our code more elegant, faster or sometimes both at the same time. However, we need to be aware of their shortcomings and know how and when to use them effectively.

Code presented in this article can be found in the following gists:

  • buzzfizz

    If I’d have wanted to use Hashes to solve fizzbuzz, I’d have done it like this:

    “`
    @fizzbuzz = { 3 => ‘Fizz’, 6 => ‘Fizz’, 9 => ‘Fizz’, 12 => ‘Fizz’, 5 => ‘Buzz’, 10 => ‘Buzz’, 0 => ‘FizzBuzz’ }

    def fizzbuzz(n)
    @fizzbuzz.fetch(n % 15, n)
    end
    “`

    • Fred@Bootstrap

      Very nice!

  • disqustla

    Hello, thank you, very good. A small correction: you define factorial_hash but you initialize fib_hash.

    • Fred@Bootstrap

      well spotted, thank you! I’ll correct it asap.

  • olla ollu

    Beautiful..

  • Doga Fincan

    I see a small error in the article: you’ve used the Fibonacci special cases in your factorial code. Other than that: really good article.

Recommended
Sponsors
Get the latest in Ruby, once a week, for free.