Algorithmic Fun with Ruby Hashes
All programmers use them. You might call them Maps, Dictionaries, Hash Maps, or Associative Arrays. In the Ruby world, we call them Hashes and they’re pretty awesome. A Hash is, simply put, a collection of key-value pairs. It is similar to an Array, except that it is indexed via arbitrary keys of any object type instead of an incremental integer. If you take a look at the Hash API, you’ll see that it offers an impressive range of functionality. In this article, we won’t be looking at the Hash API as such, but rather at using Hashes to model, abstract, and solve common logic problems.
Truth Tables and Capturing the Algorithm
Many people use Hashes purely as simple data lookup tables or as named parameters to methods. This is all fine, but we can do much more by leveraging Hash’s inherent flexibility and powerful features to implement logic and complex algorithms. Let’s look at an interviewer’s favorite, the FizzBuzz test. It goes like this:
“Write a program that prints the numbers from 1 to 100. For multiples of 3, print “Fizz” instead of the number and for multiples of 5, print “Buzz”. For numbers which are multiples of both 3 and 5, print “FizzBuzz”.”
We have two conditions here which affect the outcome: (n % 3 == 0
) and (n % 5 == 0
). As soon as we grok that, solving the problem is straightforward:
def conditional(n)
if (n % 3 == 0) && (n % 5 != 0)
'Fizz'
elsif (n % 3 != 0) && (n % 5 == 0)
'Buzz'
elsif (n % 3 == 0) && (n % 5 == 0)
'FizzBuzz'
else n
end
end
To run FizzBuzz for a sequence of 1 to 100 we do:
(1..100).each {|n| puts conditional(n)}
That was a piece of cake! But where do Hashes come into it? Well, we can capture this conditional logic as a Hash. To get a clue on how to do that, write down the truth table for FizzBuzz:
(n % 3 == 0) | (n % 5 == 0) | outcome |
---|---|---|
true | false | ‘Fizz’ |
false | true | ‘Buzz’ |
true | true | ‘FizzBuzz’ |
false | false | number |
Observe that the outcome column consists of discrete, non-nil values for any given input. That would make a good key for our Hash. Let’s rewrite our truth table to suit a Hash:
Key | Value |
---|---|
‘Fizz’ | (n % 3 == 0) && (n % 5 != 0) |
‘Buzz’ | (n % 3 != 0) && (n % 5 == 0) |
‘FizzBuzz’ | (n % 3 == 0) && (n % 5 == 0) |
number | (n % 3 != 0) && (n % 5 != 0) |
Look Ma, No If Statements!
Now, it’s easy to model our Hash on the truth table:
def declarative(n)
h = {
'Fizz' => (n % 3 == 0) && (n % 5 != 0),
'Buzz' => (n % 3 != 0) && (n % 5 == 0),
'FizzBuzz' => (n % 3 == 0) && (n % 5 == 0),
n => (n % 3 != 0) && (n % 5 != 0)
}
h.key(true)
end
Here, we’re using a Hash to implement some declarative-style programming. We’re not telling our app how to calculate the outcome for each number, we just declare what conditions determine the keys and let the data structure do the lifting. Here’s the sweet thing: as the two conditions used to determine the keys are mutually exclusive, there will only be one true value in the Hash for any number input. This means that the correct key for the input will have the value true
and all others will be false
, so the method returns the only key with a value of true
for the given input.
puts declarative(2) #=> 2
puts declarative(3) #=> Fizz
puts declarative(5) #=> Buzz
puts declarative(7) #=> 7
puts declarative(15)#=> FizzBuzz
To run FizzBuzz declaratively for a sequence of 1 to 100:
(1..100).each {|n| puts declarative(n)}
I don’t know about you, but this way appeals to me much more than the multi-conditional approach. I find it more elegant and tidy than the if..elsif
way. Unfortunately, elegance comes at a price. As you can see from the benchmarks, the declarative hash approach is much, much slower than the imperative conditional statement.
Can we reach a happy compromise between elegance and performance? Why, yes we can!
Memoization
Memoization is simply the technique of storing computed or retrieved data for future use. Also known as caching to you and I. Hashes are pretty good for storing things. Let’s have another go at FizzBuzz, memoization-style:
memoization ||= Hash.new do |hash,key|
hash[key] = case
when (key % 3 == 0) && (key % 5 != 0)
'Fizz'
when (key % 3 != 0) && (key % 5 == 0)
'Buzz'
when (key % 3 == 0) && (key % 5 == 0)
'FizzBuzz'
else key
end
end
We are creating a Hash that is initialized to the right FizzBuzz value for each numerical key passed to it. Run it for a range up to 100:
(1..100).each { |n| memoization[n] }
The memoization hash has been created and has mapped each key from 1 to 100 to its FizzBuzz value. Next time we want the FizzBuzz value of a number in this range, a simple Hash#fetch
will provide it.
Memoization has some very practical applications, other than solving FizzBuzz-style problems. We all use the web, so here’s how we could implement some raw and rugged response caching using a Hash:
require 'net/http'
http = Hash.new{|hash,key| hash[key] = Net::HTTP.get_response(URI(key)).body }
puts http['http://www.google.com'] # make actual request
puts http['http://www.google.com'] # get cached value
Now, let’s take a look at performance. Here are the benchmarks (on my machine – yours will obviously differ) when running each of our FizzBuzz solutions twice for a range of 1..1000000.
user | system | total | real |
---|---|---|---|
conditional | 0.230000 | 0.000000 | 0.230000 |
conditional | 0.240000 | 0.000000 | 0.240000 |
declarative | 0.990000 | 0.000000 | 0.990000 |
declarative | 0.980000 | 0.000000 | 0.980000 |
memoization | 1.040000 | 0.020000 | 1.060000 |
memoization | 0.370000 | 0.000000 | 0.370000 |
The first time the memoization code is run, it’s very slow. Ruby has to build up the Hash, as well as execute the conditional logic. But the 2nd time, and each time thereafter, its execution is dramatically quicker. That’s because the Hash already contains all values for that range, so it just has to look them up whenever we ask it instead of computing them all over again.
You may notice that, although quick for subsequent runs, the memoization technique is still not quite as fast as the if..elsif
approach. This is because we don’t get the full benefit of caching by applying it to simple, linear algorithms like FizzBuzz. To make the most of memoization, we need to use it on more complex, polynomial or exponential-order algorithmic logic.
Recursive Hashes
Recursion is a typical domain for fast growth algorithms. Let’s look at a classic example of recursion: the Factorial problem. The usual recursive implementation goes like this:
def factorial(x)
x == 1 ? 1 : x * factorial(x-1)
end
We can memoize it with a hash:
factorial_hash ||= Hash.new do |hash,key|
hash[key] = key * hash[key-1]
end
# initialize special cases
fib_hash[1] = 1; fib_hash[2] = 2
Here, we are initializing a Hash in such a way as to force the initialization code to be executed n times for an input of n. The first time we run, say, factorial_hash[5]
the code will run 5 times, creating a Hash for the keys 2-5. Then, each time we need to get the factorial value of a number up to 5, the Hash will just do a simple lookup and fetch the right value.
Run each technique twice to get the factorial of 3000:
factorial(3000)
factorial(3000)
factorial_hash[3000]
factorial_hash[3000]
Here are the timings:
user | real |
---|---|
factorial | 0.005829 |
factorial | 0.006082 |
factorial_hash | 0.011202 |
factorial_hash | 0.000002 |
As before, the first time the hash runs takes longer than the non-hashed equivalent, but subsequent runs are now blazingly fast.
Let’s try something even more complex: the Fibonacci sequence. Here’s the standard approach:
def fibonacci(n)
n <= 2 ? 1 : fibonacci(n - 1) + fibonacci(n - 2)
end
and here’s our caching Hash approach:
fib_hash ||= Hash.new do |hash,key|
hash[key] = hash[key-1] + hash[key-2]
end
# initialize special cases
fib_hash[1] = 1; fib_hash[2] = 1
Running each approach twice to get the Fibonacci value at 35 gives the following timings:
fibonacci(35)
fibonacci(35)
fib_hash[35]
fib_hash[35]
user | real |
---|---|
fibonacci | 0.987518 |
fibonacci | 0.981804 |
fibonacci_hash | 0.000041 |
fibonacci_hash | 0.000002 |
Now even the first run of the Hash approach is significantly faster than the non-cached recursive method. Subsequent runs put the non-cached method to shame! The more complex the problem, the greater the benefits obtained with memoization.
But every silver lining has a cloud. Using Hashes recursively adds more data onto our app’s Stack Frame, which means that recursive Hash approach has much lower threshold for Stack Overflow type errors (that’s the error, not the website). So, for algorithms with a high range of input values, recursive Hashes are probably not the best solution.
Conclusion
Hashes are more than just a way to pass options to methods. Used properly, they can help make our code more elegant, faster or sometimes both at the same time. However, we need to be aware of their shortcomings and know how and when to use them effectively.
Code presented in this article can be found in the following gists: