Functional Programming: Pure Functions

Arne Brasseur
Share

This is the second part of a two part series on functional programming in Ruby. Before we explored immutable values, now we’ll look at the other side of functional programming: pure and composable functions.

Pure Functions

A pure function is a function where the return value is only determined by its input values, without observable side effects. This is how functions in math work: Math.cos(x) will, for the same value of x, always return the same result. Computing it does not change x. It does not write to log files, do network requests, ask for user input, or change program state. It’s a coffee grinder: beans go in, powder comes out, end of story.

When a function performs any other “action”, apart from calculating its return value, the function is impure. It follows that a function which calls an impure function is impure as well. Impurity is contagious.

A given invocation of a pure function can always be replaced by its result. There’s no difference between Math.cos(Math::PI) and -1; we can always replace the first with the second. This property is called referential transparency.

Keep State Local

A pure function can only access what you pass it, so it’s easy to see its dependencies. We don’t always write functions like this. When a function accesses some other program state, such as an instance or global variable, it is no longer pure.

Take global variables as an example. These are typically considered a bad idea, and for good reason. When parts of a program start interacting through globals, it makes their communication invisible. There are dependencies that, on the surface, are hard to spot. They cause maintenance nightmares. The programmer needs to mentally keep track of how things are related and orchestrate everything just right. Small changes in one place can cause seemingly unrelated code to fail.

Pure Methods

In Ruby we don’t usually talk about functions. Instead, we have objects with methods, but the difference is small. When you call a method on an object, it’s as if the object is passed to the function as the argument self. It’s another value the function can rely on to compute its result.

Take the upcase method of String

str = "ukulele"
str.upcase # => "UKULELE"
str        # => "ukulele"

It turns a string into uppercase, but the original string remains untouched. upcase didn’t do anything else, such as write to a log file or read mouse input. upcase is pure. The same can’t be said of upcase!

str = "ukulele"
str.upcase! # => "UKULELE"
str         # => "UKULELE"

Ruby adds the bang to signal that this function is destructive. After you call it, the original string is gone, replaced by the new version. upcase! is not pure.

Benefits

Pure functions go hand in hand with immutable values (see the previous article). Together they lead to declarative programs, describing how inputs relate to outputs, without spelling out the steps to get from A to B. This can simplify systems and, in the face of concurrency, referential transparency is a godsend.

Reproducible Results

When functions are pure and values are easy to inspect and create, then every function call can be reproduced in isolation. The impact this has on testing and debugging is hard to overstate.

To write a test, you declare the values that will act as arguments, pass them to the function, and verify the output. There is no context to set up, no current account, request, or user. There are no side effects to mock or stub. Instantiate a representative set of inputs, and validate the outputs. Testing doesn’t get more straightforward than this.

Parallelization

Pure functions can always be parallelized. Distribute the input values over a number of threads, and collect the results. Here’s a naive version of a parallel map method:

module Enumerable
  def pmap(cores = 4, &block)
    [].tap do |result|
      each_slice((count.to_f/cores).ceil).map do |slice|
        Thread.new(result) do |result|
          slice.each do |item|
            result << block.call(item)
          end
        end
      end.map(&:join)
    end
  end
end

Now let’s simulate some expensive computation:

def report_time
  t = Time.now
  yield
  puts Time.now-t
end

report_time {
  100.times.map {|x| sleep(0.1); x*x }
}
# 10.014289725

report_time {
  100.times.pmap {|x| sleep(0.1); x*x }
}
# 2.504685127

The version with #map took 10 seconds to complete, the parallel version only took 2.5 seconds. But we can only swap out #map for #pmap if we know the function called is pure.

Memoization

Because pure functions are referentially transparent, we only need to compute their output once for given inputs. Caching and reusing the result of a computation is called memoization, and can only be done safely with pure functions.

Laziness

A variation on the same theme. We only ever need to compute the result of a pure function once, but what if we can avoid the computation entirely? Invoking a pure function means you specify a dependency: this output value depends on these input values. But what if you never use the output value? Because the function can not cause side effects, it does not matter if it is called or not. Hence a smart system can be lazy and optimize the call away.

Some languages, like Haskell, are completely built on lazy evaluation. Only values that are needed to achieve side effects are computed, the rest is ignored. Ruby’s evaluation strategy is called strict evaluation, each expression is completely evaluated before its result can be used in another expression. This is unfortunate, but with some imagination we can build our own opt-in laziness.

class Lazy < BasicObject
  def initialize(&blk)
    @blk = blk
  end

  def method_missing(name, *args, &blk)
    _resolve.send(name, *args, &blk)
  end

  def respond_to?(name)
    _resolve.respond_to?(name)
  end

  def _resolve
    @resolved ||= @blk.call
  end
end

def lazy(&blk)
  Lazy.new(&blk)
end

Now we can wrap potentially costly computations in lazy {},

def mul(a, b, c)
  a * b
end

a = lazy { sleep(0.5) ; 5 }
b = lazy { sleep(0.5) ; 7 }
c = lazy { sleep(3)   ; 9 }

mul(a, b, c)
# => 35

The calls to sleep simulate some CPU-intensive task. The final result pops up after about a second. Even though it would take 3 seconds to compute c, because the value is never used we don’t have to incur that cost.

Refactoring to Functional

There is a catch though. Much of what our programs do (interacting with databases, serving network requests, writing to log files) is inherently side-effectful. Our programs are processes that deal with inputs and generate outputs over time, they are not mathematical functions. There are ways to get the best of both worlds, however.

One fruitful approach is to separate the pure, functional, value based core of your application from an outer, imperative shell. Take a command line application that needs to parse command line arguments:

def parse_cli_options
  opts = OptionParser.new do |opts|
    opts.banner = 'cli_tool [options] infile outfile'
    opts.on('--version', 'Print version') do |name|
      $stderr.puts VERSION
      exit 0
    end.on('--help', 'Display help') do
      $stderr.puts opts
      exit 0
    end
  end

  opts.parse!(ARGV)
  if ARGV.length != 2
    $stderr.puts "Wrong number of arguments"
    $stderr.puts opts
    exit 1
  end

  opts
end

This is about as far away from a pure function as you can get. It does all of the following.

  • Write directly to $stderr
  • Call Kernel.exit
  • Rely on the global ARGV
  • Alter ARGV

How would you go about writing tests for such a monstrosity? It’s close it impossible. To make it a pure function, we need to ask ourselves what needs to go in and what should come out. As input, this function clearly needs access to the command line arguments. As output, it needs to tell us:

  • Was the parsing successful?
  • If not, what’s the error message?
  • What exit code should the process use?

    def parse_cli_options(argv)
      opts = OptionParser.new do |opts|
        opts.banner = 'cli_tool [options] infile outfile'
        opts.on('--version', 'Print version') do |name|
          return { message: VERSION }
        end.on('--help', 'Display help') do
          return { message: opts.to_s }
        end
      end
    
    
      filenames = opts.parse(argv)
      if filename.length != 2
        return {
          message: ["Wrong number of arguments!", opts].join("\n"),
          exit_code: 1
        }
      end
      { filename: filenames }
    end

Now we have a pure function that’s very easy to test, and we can wrap it an “imperative shell”.

def run
  result = parse_cli_options(ARGV)
  perform(*result[:filenames])  if result.key?(:filenames)
  $stderr.puts result[:message] if result.key?(:message)
  Kernel.exit(result.fetch(:exit_code, 0))
end

Keeping the core strictly functional is necessary, since a single impure function would contaminate any function that calls it. Notice how we turned some side effects, such as exiting the process, into an intermediate value representing that side effect. You can valuefy anything this way, even error conditions or database operations, reaping the benefits of functional programming.

Functional programming is a big subject, and one that not all Rubyists understand. After these two articles, you should have a good foundation for making your own code more functional. Try it out, and see where the journey leads you.

CSS Master, 3rd Edition