This is the second part of a two part series on functional programming in Ruby. Before we explored immutable values, now we’ll look at the other side of functional programming: pure and composable functions.
Pure Functions
A pure function is a function where the return value is only determined by its input values, without observable side effects. This is how functions in math work: Math.cos(x)
will, for the same value of x
, always return the same result. Computing it does not change x
. It does not write to log files, do network requests, ask for user input, or change program state. It’s a coffee grinder: beans go in, powder comes out, end of story.
When a function performs any other “action”, apart from calculating its return value, the function is impure. It follows that a function which calls an impure function is impure as well. Impurity is contagious.
A given invocation of a pure function can always be replaced by its result. There’s no difference between Math.cos(Math::PI)
and -1
; we can always replace the first with the second. This property is called referential transparency.
Keep State Local
A pure function can only access what you pass it, so it’s easy to see its dependencies. We don’t always write functions like this. When a function accesses some other program state, such as an instance or global variable, it is no longer pure.
Take global variables as an example. These are typically considered a bad idea, and for good reason. When parts of a program start interacting through globals, it makes their communication invisible. There are dependencies that, on the surface, are hard to spot. They cause maintenance nightmares. The programmer needs to mentally keep track of how things are related and orchestrate everything just right. Small changes in one place can cause seemingly unrelated code to fail.
Pure Methods
In Ruby we don’t usually talk about functions. Instead, we have objects with methods, but the difference is small. When you call a method on an object, it’s as if the object is passed to the function as the argument self
. It’s another value the function can rely on to compute its result.
Take the upcase
method of String
str = "ukulele"
str.upcase # => "UKULELE"
str # => "ukulele"
It turns a string into uppercase, but the original string remains untouched. upcase
didn’t do anything else, such as write to a log file or read mouse input. upcase
is pure. The same can’t be said of upcase!
str = "ukulele"
str.upcase! # => "UKULELE"
str # => "UKULELE"
Ruby adds the bang to signal that this function is destructive. After you call it, the original string is gone, replaced by the new version. upcase!
is not pure.
Benefits
Pure functions go hand in hand with immutable values (see the previous article). Together they lead to declarative programs, describing how inputs relate to outputs, without spelling out the steps to get from A to B. This can simplify systems and, in the face of concurrency, referential transparency is a godsend.
Reproducible Results
When functions are pure and values are easy to inspect and create, then every function call can be reproduced in isolation. The impact this has on testing and debugging is hard to overstate.
To write a test, you declare the values that will act as arguments, pass them to the function, and verify the output. There is no context to set up, no current account, request, or user. There are no side effects to mock or stub. Instantiate a representative set of inputs, and validate the outputs. Testing doesn’t get more straightforward than this.
Parallelization
Pure functions can always be parallelized. Distribute the input values over a number of threads, and collect the results. Here’s a naive version of a parallel map method:
module Enumerable
def pmap(cores = 4, &block)
[].tap do |result|
each_slice((count.to_f/cores).ceil).map do |slice|
Thread.new(result) do |result|
slice.each do |item|
result << block.call(item)
end
end
end.map(&:join)
end
end
end
Now let’s simulate some expensive computation:
def report_time
t = Time.now
yield
puts Time.now-t
end
report_time {
100.times.map {|x| sleep(0.1); x*x }
}
# 10.014289725
report_time {
100.times.pmap {|x| sleep(0.1); x*x }
}
# 2.504685127
The version with #map
took 10 seconds to complete, the parallel version only took 2.5 seconds. But we can only swap out #map
for #pmap
if we know the function called is pure.
Memoization
Because pure functions are referentially transparent, we only need to compute their output once for given inputs. Caching and reusing the result of a computation is called memoization, and can only be done safely with pure functions.
Laziness
A variation on the same theme. We only ever need to compute the result of a pure function once, but what if we can avoid the computation entirely? Invoking a pure function means you specify a dependency: this output value depends on these input values. But what if you never use the output value? Because the function can not cause side effects, it does not matter if it is called or not. Hence a smart system can be lazy and optimize the call away.
Some languages, like Haskell, are completely built on lazy evaluation. Only values that are needed to achieve side effects are computed, the rest is ignored. Ruby’s evaluation strategy is called strict evaluation, each expression is completely evaluated before its result can be used in another expression. This is unfortunate, but with some imagination we can build our own opt-in laziness.
class Lazy < BasicObject
def initialize(&blk)
@blk = blk
end
def method_missing(name, *args, &blk)
_resolve.send(name, *args, &blk)
end
def respond_to?(name)
_resolve.respond_to?(name)
end
def _resolve
@resolved ||= @blk.call
end
end
def lazy(&blk)
Lazy.new(&blk)
end
Now we can wrap potentially costly computations in lazy {}
,
def mul(a, b, c)
a * b
end
a = lazy { sleep(0.5) ; 5 }
b = lazy { sleep(0.5) ; 7 }
c = lazy { sleep(3) ; 9 }
mul(a, b, c)
# => 35
The calls to sleep
simulate some CPU-intensive task. The final result pops up after about a second. Even though it would take 3 seconds to compute c
, because the value is never used we don’t have to incur that cost.
Refactoring to Functional
There is a catch though. Much of what our programs do (interacting with databases, serving network requests, writing to log files) is inherently side-effectful. Our programs are processes that deal with inputs and generate outputs over time, they are not mathematical functions. There are ways to get the best of both worlds, however.
One fruitful approach is to separate the pure, functional, value based core of your application from an outer, imperative shell. Take a command line application that needs to parse command line arguments:
def parse_cli_options
opts = OptionParser.new do |opts|
opts.banner = 'cli_tool [options] infile outfile'
opts.on('--version', 'Print version') do |name|
$stderr.puts VERSION
exit 0
end.on('--help', 'Display help') do
$stderr.puts opts
exit 0
end
end
opts.parse!(ARGV)
if ARGV.length != 2
$stderr.puts "Wrong number of arguments"
$stderr.puts opts
exit 1
end
opts
end
This is about as far away from a pure function as you can get. It does all of the following.
- Write directly to
$stderr
- Call
Kernel.exit
- Rely on the global
ARGV
- Alter
ARGV
How would you go about writing tests for such a monstrosity? It’s close it impossible. To make it a pure function, we need to ask ourselves what needs to go in and what should come out. As input, this function clearly needs access to the command line arguments. As output, it needs to tell us:
- Was the parsing successful?
- If not, what’s the error message?
-
What exit code should the process use?
def parse_cli_options(argv) opts = OptionParser.new do |opts| opts.banner = 'cli_tool [options] infile outfile' opts.on('--version', 'Print version') do |name| return { message: VERSION } end.on('--help', 'Display help') do return { message: opts.to_s } end end filenames = opts.parse(argv) if filename.length != 2 return { message: ["Wrong number of arguments!", opts].join("\n"), exit_code: 1 } end { filename: filenames } end
Now we have a pure function that’s very easy to test, and we can wrap it an “imperative shell”.
def run
result = parse_cli_options(ARGV)
perform(*result[:filenames]) if result.key?(:filenames)
$stderr.puts result[:message] if result.key?(:message)
Kernel.exit(result.fetch(:exit_code, 0))
end
Keeping the core strictly functional is necessary, since a single impure function would contaminate any function that calls it. Notice how we turned some side effects, such as exiting the process, into an intermediate value representing that side effect. You can valuefy anything this way, even error conditions or database operations, reaping the benefits of functional programming.
Functional programming is a big subject, and one that not all Rubyists understand. After these two articles, you should have a good foundation for making your own code more functional. Try it out, and see where the journey leads you.