Responsibility Centric Design Using TDD

Responsibility

In this article, we’ll take a fresh look at tackling an external sort problem by applying a responsibility centric technique. Each responsibility in this particular problem gets encapsulated in a way that makes it easy to do test driven development and leaves us with code that is flexible and maintainable.

Hopefully, by the end of this article you will learn a radical approach at tackling exciting problems.

External Sort

External sort is a fascinating problem, as you are limited by the amount of memory available to do a sort on an input file.

This Wikipedia article states the crux of the problem:

External sorting is a term for a class of sorting algorithms that can handle massive amounts of data. External sorting is required when the data being sorted do not fit into the main memory of a computing device (usually RAM) and instead they must reside in the slower external memory (usually a hard drive).

I am limiting the problem to a randomly generated input file of size M and an array size of N in which N * N = M to keeps things simple. There is also a need to check our work, so we’ll write a check utility to make sure the sort was successful.

Responsibility Centric Design

The basic approach in this design pattern is to break the problem down into contracts which will then become our public APIs. Each contract will encapsulate a separate role in the solution. The key here is to focus on what each encapsulation does as opposed to how it does it.

This Practicing Ruby article details out the gist of this approach:

In a responsibility-centric design, systems are divided by the collection of behaviors that they must implement. The goal of this division is to formulate the description of the behavior of the system in terms of multiple interacting processes, with each process playing a separate role.

I break the problem down into a writer, reader, and queuer. The Writer and Reader will give me a general API to read or write into persistent storage. Since I have to break the large main input file into a smaller cache, Queuer will serve as the access point for each individual file for external sort. An interesting observation here is that roles tend to be named after verbs since they are designed to go out and do something. At this time, notice how each encapsulation is not concerned with how this is done.

So, let’s break it down:

module FIO
  class Writer
    def open(file)
    end
    def append(n)
    end
    def close
    end
  end

  class Reader
    def open(file)
    end
    def read
    end
    def close
    end
  end

  class Queuer
    def open(file)
    end
    def [](index)
    end
    def close
    end
  end
end

Test driven development

Now that we have the contracts completed, we can start thinking about writing tests. For test driven development, I would like to focus your attention on how we would conceptually solve this problem. The key here is to focus on each contract and what the proper behavior should be. Fortunately Ruby, being a dynamic language, makes this very easy so we don’t have to spend any time mocking and stubbing classes by writing up interface code.

So let’s talk about what we need.

To generate the main randomly generated input file, we need some kind of Export utility. For example:

class Export
  def initialize(writer)
    @writer = writer
  end
  def out(max, file_name)
  end
end

To do the external sort, we need a Sort utility:

class Sort
  def initialize(reader, writer, queuer)
    @reader = reader
    @writer = writer
    @queuer = queuer
  end
  def external(max, chache_size, file_in, file_out)
  end
end

Finally, to do the check at the end, we need a Check utility:

class Check
  def initialize(reader)
  end
  def is_sorted(max, file_out)
  end
end

That should do it.

The relationship between these utility objects and our contracts resemble that of a client / server model. Here the server gets injected as a dependency. The client code will then call the public API or contract to fulfill requests.

As an analogy, data that flows from our contracts is like the water service we get in our house. The utility objects are like the necessary plumbing in our home that fulfill the service. One thing to keep in mind is, in a responsibility centric approach, data flows through the system as opposed to being put in a single monolithic place.

Notice, at this point in the development process, we are not concerned with any implementation details. Our unit tests will be designed to guide the creative process.

Unit tests

For the purposes of this article, I am using mocha to help me write the tests.

In the tests, we’ll define M = 9 and N = 3,. Finally, a test filename like in for input and out for output should do.

Let’s conceptualize our export utility:

class ExportTest < Test::Unit::TestCase
  def test_export_out
    # Mock dependency and add expectations
    writer = mock()
    writer.expects(:open).with('in').at_least_once
    writer.expects(:append).with(anything).times(9)
    writer.expects(:close).at_least_once
    # Call utility
    Export.new(writer).out(9, 'in')
  end
end

To successfully pass this test we can do:

def out(max, file_name)
  @writer.open(file_name)
  max.times { @writer.append(Random.rand(10)) }
  @writer.close
end

The same pattern can be applied to the check utility:

class CheckTest < Test::Unit::TestCase
  def test_check_is_sorted
    reader = mock()
    reader.expects(:open).with('out').at_least_once
    reader.expects(:read).times(9).returns(0)
    reader.expects(:close).at_least_once
    target = Check.new(reader).is_sorted(9, 'out')
    assert target
  end

To successfully pass this test we can do:

def is_sorted(max, file_out)
  return true if max <= 1
  sorted = true
  @reader.open(file_out)
  a = @reader.read
  (max - 1).times do
    b = @reader.read
    sorted = false if a > b
    a = b
  end
  @reader.close
  sorted
end

For our final test, we need to implement external sort. The first thing is to break the input file into sorted chunks. That means we are reading from our input file 9 times. We’ll also be writing to separate chunk files a total of 9 times. Finally, we will write the final result into the output file a total of 9 times, which gives us a complete total of 18.

There are nuances with our Queuer that are accounted for in this test. Each index of the queuer object will give me access to an individual chunk. I am using the queuer to provide the data for sorting until it runs out. In the test case, that will be a total of 9 plus 3 since it needs to tell me when it no longer has anything.

So here is the test:

class SortTest < Test::Unit::TestCase
  def test_sort_external
    writer = mock()
    reader = mock()
    queuer = mock()
    reader.expects(:open).with('in').at_least_once
    reader.expects(:read).times(9)
    reader.expects(:close).at_least_once
    writer.expects(:open).with('out').at_least_once
    writer.expects(:open).with('in1').at_least_once
    writer.expects(:open).with('in2').at_least_once
    writer.expects(:open).with('in3').at_least_once
    writer.expects(:append).with(anything).times(18)
    writer.expects(:close).at_least_once
    queuer.expects(:open).with(3, 'in').at_least_once
    queuer.expects(:[]).with(anything).times(12).returns(0)
    queuer.expects(:close).at_least_once
    Sort.new(reader, writer, queuer).external(9, 3, 'in', 'out')
  end
end

To make the test pass, we can do:

def external(max, cache_size, file_in, file_out)
  capacity = Array.new(cache_size)
  # Break file_in into chunks
  @reader.open(file_in)
  max.times do |index|
    capacity[index % cache_size] = @reader.read
    if is_end_of_chunk(index, max, cache_size)
      capacity.sort!
      # Use naming convention, e.g., file_in1
      @writer.open(file_in + ((index + 1) / cache_size).to_s)
      cache_size.times { |i| @writer.append(capacity[i]) }
      @writer.close
    end
  end
  @reader.close
  # Merge chunks onto file_out
  @writer.open(file_out)
  @queuer.open(cache_size, file_in)
  cache_size.times { |i| capacity[i] = @queuer[i] }
  max.times do
    lowest = min_index(capacity)
    @writer.append(capacity[lowest])
    capacity[lowest] = @queuer[lowest]
  end
  @queuer.close
  @writer.close
end

private
  def is_end_of_chunk(index, max, size)
    (!index.zero? && ((index + 1) % size).zero?) || max == 1
  end
  def min_index(arr)
    index = 0
    (1..arr.length - 1).each do |i|
      next if arr[i].nil?
      index = i if arr[index].nil? || arr[index] > arr[i]
    end
    index
  end

When we run these tests, nothing is actually getting written into your file system. Again, the point here is to conceptually test our code to ensure proper behavior. Unit testing is the art of testing isolated code without crossing functional boundaries, like writing to a file or sending a network request.

Conclusion

Test driven development brings a lot of challenges in thinking about solutions from a very abstracted view. However, responsibility centric solutions benefit from already doing a lot of the heavy lifting for you by separating concerns and enforcing well designed layers of abstraction.

I omitted implementation details as an exercise for the reader. If you are interested, I have posted the rest of the code on GitHub.

Happy hacking!