# Algorithmic Fun with Ruby Hashes

All programmers use them. You might call them Maps, Dictionaries, Hash Maps, or Associative Arrays. In the Ruby world, we call them *Hashes* and they’re pretty awesome. A Hash is, simply put, a collection of key-value pairs. It is similar to an Array, except that it is indexed via arbitrary keys of any object type instead of an incremental integer. If you take a look at the Hash API, you’ll see that it offers an impressive range of functionality. In this article, we won’t be looking at the Hash API as such, but rather at using Hashes to model, abstract, and solve common logic problems.

## Truth Tables and Capturing the Algorithm

Many people use Hashes purely as simple data lookup tables or as named parameters to methods. This is all fine, but we can do much more by leveraging Hash’s inherent flexibility and powerful features to implement logic and complex algorithms. Let’s look at an interviewer’s favorite, the FizzBuzz test. It goes like this:

“Write a program that prints the numbers from 1 to 100. For multiples of 3, print “Fizz” instead of the number and for multiples of 5, print “Buzz”. For numbers which are multiples of both 3 and 5, print “FizzBuzz”.”

We have two conditions here which affect the outcome: (`n % 3 == 0`

) and (`n % 5 == 0`

). As soon as we grok that, solving the problem is straightforward:

```
def conditional(n)
if (n % 3 == 0) && (n % 5 != 0)
'Fizz'
elsif (n % 3 != 0) && (n % 5 == 0)
'Buzz'
elsif (n % 3 == 0) && (n % 5 == 0)
'FizzBuzz'
else n
end
end
```

To run FizzBuzz for a sequence of 1 to 100 we do:

`(1..100).each {|n| puts conditional(n)}`

That was a piece of cake! But where do Hashes come into it? Well, we can capture this conditional logic as a Hash. To get a clue on how to do that, write down the truth table for FizzBuzz:

(n % 3 == 0) | (n % 5 == 0) | outcome |
---|---|---|

true | false | ‘Fizz’ |

false | true | ‘Buzz’ |

true | true | ‘FizzBuzz’ |

false | false | number |

Observe that the *outcome* column consists of discrete, non-nil values for any given input. That would make a good key for our Hash. Let’s rewrite our truth table to suit a Hash:

Key | Value |
---|---|

‘Fizz’ | (n % 3 == 0) && (n % 5 != 0) |

‘Buzz’ | (n % 3 != 0) && (n % 5 == 0) |

‘FizzBuzz’ | (n % 3 == 0) && (n % 5 == 0) |

number | (n % 3 != 0) && (n % 5 != 0) |

### Look Ma, No If Statements!

Now, it’s easy to model our Hash on the truth table:

```
def declarative(n)
h = {
'Fizz' => (n % 3 == 0) && (n % 5 != 0),
'Buzz' => (n % 3 != 0) && (n % 5 == 0),
'FizzBuzz' => (n % 3 == 0) && (n % 5 == 0),
n => (n % 3 != 0) && (n % 5 != 0)
}
h.key(true)
end
```

Here, we’re using a Hash to implement some declarative-style programming. We’re not telling our app how to calculate the outcome for each number, we just declare what conditions determine the keys and let the data structure do the lifting. Here’s the sweet thing: as the two conditions used to determine the keys are mutually exclusive, there will only be one *true* value in the Hash for any number input. This means that the correct key for the input will have the value `true`

and all others will be `false`

, so the method returns the only key with a value of `true`

for the given input.

```
puts declarative(2) #=> 2
puts declarative(3) #=> Fizz
puts declarative(5) #=> Buzz
puts declarative(7) #=> 7
puts declarative(15)#=> FizzBuzz
```

To run FizzBuzz declaratively for a sequence of 1 to 100:

`(1..100).each {|n| puts declarative(n)}`

I don’t know about you, but this way appeals to me much more than the multi-conditional approach. I find it more elegant and tidy than the `if..elsif`

way. Unfortunately, elegance comes at a price. As you can see from the benchmarks, the declarative hash approach is much, much slower than the imperative conditional statement.

Can we reach a happy compromise between elegance and performance? Why, yes we can!

## Memoization

Memoization is simply the technique of storing computed or retrieved data for future use. Also known as *caching* to you and I. Hashes are pretty good for storing things. Let’s have another go at FizzBuzz, memoization-style:

```
memoization ||= Hash.new do |hash,key|
hash[key] = case
when (key % 3 == 0) && (key % 5 != 0)
'Fizz'
when (key % 3 != 0) && (key % 5 == 0)
'Buzz'
when (key % 3 == 0) && (key % 5 == 0)
'FizzBuzz'
else key
end
end
```

We are creating a Hash that is initialized to the right FizzBuzz value for each numerical key passed to it. Run it for a range up to 100:

`(1..100).each { |n| memoization[n] }`

The memoization hash has been created and has mapped each key from 1 to 100 to its FizzBuzz value. Next time we want the FizzBuzz value of a number in this range, a simple `Hash#fetch`

will provide it.

Memoization has some very practical applications, other than solving FizzBuzz-style problems. We all use the web, so here’s how we could implement some raw and rugged response caching using a Hash:

```
require 'net/http'
http = Hash.new{|hash,key| hash[key] = Net::HTTP.get_response(URI(key)).body }
puts http['http://www.google.com'] # make actual request
puts http['http://www.google.com'] # get cached value
```

Now, let’s take a look at performance. Here are the benchmarks (on my machine – yours will obviously differ) when running each of our FizzBuzz solutions twice for a range of 1..1*000*000.

user | system | total | real |
---|---|---|---|

conditional | 0.230000 | 0.000000 | 0.230000 |

conditional | 0.240000 | 0.000000 | 0.240000 |

declarative | 0.990000 | 0.000000 | 0.990000 |

declarative | 0.980000 | 0.000000 | 0.980000 |

memoization | 1.040000 | 0.020000 | 1.060000 |

memoization | 0.370000 | 0.000000 | 0.370000 |

The first time the memoization code is run, it’s very slow. Ruby has to build up the Hash, as well as execute the conditional logic. But the *2nd time*, and each time thereafter, its execution is dramatically quicker. That’s because the Hash already contains all values for that range, so it just has to look them up whenever we ask it instead of computing them all over again.

You may notice that, although quick for subsequent runs, the memoization technique is still not quite as fast as the `if..elsif`

approach. This is because we don’t get the full benefit of caching by applying it to simple, linear algorithms like FizzBuzz. To make the most of memoization, we need to use it on more complex, polynomial or exponential-order algorithmic logic.

## Recursive Hashes

Recursion is a typical domain for fast growth algorithms. Let’s look at a classic example of recursion: the *Factorial* problem. The usual recursive implementation goes like this:

```
def factorial(x)
x == 1 ? 1 : x * factorial(x-1)
end
```

We can memoize it with a hash:

```
factorial_hash ||= Hash.new do |hash,key|
hash[key] = key * hash[key-1]
end
# initialize special cases
fib_hash[1] = 1; fib_hash[2] = 2
```

Here, we are initializing a Hash in such a way as to force the initialization code to be executed *n* times for an input of *n*. The first time we run, say, `factorial_hash[5]`

the code will run 5 times, creating a Hash for the keys 2-5. Then, each time we need to get the factorial value of a number up to 5, the Hash will just do a simple lookup and fetch the right value.

Run each technique twice to get the factorial of 3000:

```
factorial(3000)
factorial(3000)
factorial_hash[3000]
factorial_hash[3000]
```

Here are the timings:

user | real |
---|---|

factorial | 0.005829 |

factorial | 0.006082 |

factorial_hash | 0.011202 |

factorial_hash | 0.000002 |

As before, the first time the hash runs takes longer than the non-hashed equivalent, but subsequent runs are now blazingly fast.

Let’s try something even more complex: the *Fibonacci* sequence. Here’s the standard approach:

```
def fibonacci(n)
n <= 2 ? 1 : fibonacci(n - 1) + fibonacci(n - 2)
end
```

and here’s our caching Hash approach:

```
fib_hash ||= Hash.new do |hash,key|
hash[key] = hash[key-1] + hash[key-2]
end
# initialize special cases
fib_hash[1] = 1; fib_hash[2] = 1
```

Running each approach twice to get the Fibonacci value at 35 gives the following timings:

```
fibonacci(35)
fibonacci(35)
fib_hash[35]
fib_hash[35]
```

user | real |
---|---|

fibonacci | 0.987518 |

fibonacci | 0.981804 |

fibonacci_hash | 0.000041 |

fibonacci_hash | 0.000002 |

Now even the first run of the Hash approach is significantly faster than the non-cached recursive method. Subsequent runs put the non-cached method to shame! The more complex the problem, the greater the benefits obtained with memoization.

But every silver lining has a cloud. Using Hashes recursively adds more data onto our app’s Stack Frame, which means that recursive Hash approach has much lower threshold for Stack Overflow type errors (that’s the *error*, *not* the website). So, for algorithms with a high range of input values, recursive Hashes are probably not the best solution.

## Conclusion

Hashes are more than just a way to pass options to methods. Used properly, they can help make our code more elegant, faster or sometimes both at the same time. However, we need to be aware of their shortcomings and know how and when to use them effectively.

Code presented in this article can be found in the following gists: