Golden Master: Discovering Abstractions

Grungy background with a metallic heart and rivets

What started as a quest to tame some unruly views, turned into a long and arduous slog through controllers, helper methods, duplication, and—finally—confusion.

The current state of affairs is that we have a model object that encapsulates all the logic that used to live in the views and controllers. We’ve extracted methods and renamed variables finding some measure of understanding. The code is no longer scary or intimidating, but it’s still not anything to boast about.

module Stats
  class RunningTimeData
    include Rails.application.routes.url_helpers

    attr_reader :title, :dates, :key, :today

    def initialize(title, timestamps, key, now)
      @title = title
      @dates = timestamps.map(&:to_date)
      @key = key
      @today = now.to_date
    end

    def y_legend
      I18n.t('stats.running_time_legend.actions')
    end

    def y2_legend
      I18n.t('stats.running_time_legend.percentage')
    end

    def x_legend
      I18n.t('stats.running_time_legend.weeks')
    end

    def values
      datapoints_per_week_in_chart.join(",")
    end

    def links
      url_labels.join(",")
    end

    def values_2
      cumulative_percentages.join(",")
    end

    def x_labels
      time_labels.join(",")
    end

    def y_max
      # add one to @max for people who have no actions completed yet.
      # OpenFlashChart cannot handle y_max=0
      1 + datapoints_per_week_in_chart.max + datapoints_per_week_in_chart.max/10
    end

    private

    def url_labels
      Array.new(total_weeks_in_chart) { |i| url(i, key) } << url(total_weeks_in_chart, "#{key}_end")
    end

    def url(index, id)
      options = {
        :controller => 'stats',
        :action => 'show_selected_actions_from_chart',
        :index => index,
        :id=> id,
        :only_path => true
      }
      url_for(options)
    end

    def time_labels
      labels = Array.new(total_weeks_in_chart) { |i| "#{i}-#{i+1}" }
      labels[0] = "< 1"
      labels[total_weeks_in_chart] = "> #{total_weeks_in_chart}"
      labels
    end

    def total_weeks_in_chart
      [52, total_weeks_of_data].min
    end

    def total_weeks_of_data
      weeks_since(dates.last)
    end

    def weeks_since(date)
      (today - date).to_i / 7
    end

    def cumulative_percentages
      running_total = 0
      percentages_per_week.map {|count| running_total += count}
    end

    def percentages_per_week
      datapoints_per_week_in_chart.map(&percentage)
    end

    def percentage
      Proc.new {|count| (count * 100.0 / dates.count)}
    end

    def datapoints_per_week_in_chart
      frequencies = Array.new(total_weeks_in_chart) {|i|
        datapoints_per_week[i].to_i
      }
      frequencies << datapoints_per_week.inject(:+) - frequencies.inject(:+)
    end

    def datapoints_per_week
      frequencies = Array.new(total_weeks_of_data + 1, 0)
      dates.each {|date|
        frequencies[weeks_since(date)] += 1
      }
      frequencies
    end
  end
end

Shades of Good

Steve Freeman and Nat Pryce, authors of Growing Object-Oriented Software Guided by Tests GOOS) talk about four desirable characteristics of object-oriented code:

Loosely coupled
Highly cohesive
Easily composable
Context independent

The Stats::RunningTimeData object fails on all four accounts.

Loosely Coupled

This code depends on the I18n library.

I18n.t('stats.running_time_legend.percentage')

Worse, though, you can’t instantiate it without Rails!

include Rails.application.routes.url_helpers

Imagine wanting to use the same functionality in a Sinatra application or a stand-alone email application that delivers reports on a regular basis. It wouldn’t work in its current form, because you’d need to drag Rails along.

Highly Cohesive

The code is certainly not cohesive.

It has some methods that are all about describing a chart: x_legend, y_legend, values. It also has a bunch of methods that have nothing to do with charts. They’re more about weeks: weeks_since(date), total_weeks_of_data, datapoints_per_week.

Cohesiveness is often related to the Single Responsibility Principle (SRP). Does this object do too many disparate things? Does it have many different reasons to change?

It seems like there are at least two different things here: weekly statistics and chart configuration.

Easily Composable

This is a big hulking object. You can use it or you can leave it, but you’re not going to combine it with other objects to make something useful.

Context Independent

This class can only be used in the context of the two specific charts that it was made to handle. If you wanted to email a plain text report of the same data in addition to showing the chart, you would be plain out of luck.

Make It So

The methods we need to extract out of the Stats::RunningTimeData model are:

datapoints per week
percentages per week
cumulative percentages per week

Since per week repeats, it seems reasonable to make that part of the class.

But what if you want monthly statistics too?

Yeah, that’s a fair question. We might. But right now we don’t, and we don’t know.

Let’s pretend that we aren’t going to need it. Later, if we see duplication, then perhaps there will be an obvious abstraction. For now it’s easier to deal with what is actually here, than some nebulous and hypothetical future.

Introducing `WeeklyHistogram`

We’ve been relying heavily on the Golden Master tests. Now it’s time to start fresh:

mkdir test/models/stats
touch test/models/stats/weekly_histogram_test.rb

One of the nicest things about creating a completely context-independent, loosely coupled class is that you don’t have to wait for the entire framework to load when running the tests.

Up until now, we’ve had to wait 3 seconds for every test run, even though we’ve been executing a single file. The lockdown test goes through the database, controllers, views—everything.

3 seconds. That’s long enough to check Twitter.

In the new tests, all we need is date from the Standard Library, minitest, and the (as yet hypothetical) weekly_histogram class.

require 'date'
require 'minitest/autorun'
require './app/models/stats/weekly_histogram'

class StatsWeeklyHistogramTest < Minitest::Test
end

This runs in less than 300 ms. It feels almost instantaneous.

Implementing Histogram

In the spirit of programming by wishful thinking, here’s the API that I’d like to have:

histogram.counts
histogram.percentages
histogram.cumulative_percentages

The refactoring we did in the previous two articles uncovered some interesting characteristics about the stats:

There is a cutoff: A maximum number of weeks to include as independent statistics.
If the cutoff is older than the oldest datapoint, then you don’t fill in empty weeks up to the cutoff.
If you have more weeks worth of data than the cutoff, then everything in the overflow will be combined into a single bucket.
If we have no data for a given week in the middle of our dataset, it will correctly be counted as 0.

We’re going to need two tests to cover this: a test for fewer weeks than the cutoff, and one for more weeks than the cutoff. The test should include a week of no data, to make sure we don’t accidentally lose that behavior.

Here’s the test suite that I ended up with covering the entire API of the new object:

def test_fewer_weeks_than_cutoff
  today = Date.new(2014, 7, 1)
  cutoff = 10 # weeks
  dates = [
    Date.new(2014, 6, 30),
    Date.new(2014, 6, 27),
    Date.new(2014, 6, 23),
    Date.new(2014, 6, 23),
    Date.new(2014, 6, 22),
    # no data for 3 weeks ago
    Date.new(2014, 6, 9),
  ]
  histogram = Stats::WeeklyHistogram.new(dates, cutoff, today)
  assert_equal [2, 3, 0, 1], histogram.counts
end

def test_more_weeks_than_cutoff
  today = Date.new(2014, 7, 1)
  cutoff = 3 # weeks
  dates = [
    Date.new(2014, 6, 30),
    Date.new(2014, 6, 21),
    Date.new(2014, 6, 20),
    Date.new(2014, 6, 1), # June
    Date.new(2014, 5, 1), # May
    Date.new(2014, 4, 1), # April
    Date.new(2014, 3, 1), # March
  ]
  histogram = Stats::WeeklyHistogram.new(dates, cutoff, today)
  assert_equal [1, 2, 0, 4], histogram.counts
end

def test_percentages
  today = Date.new(2014, 7, 1)
  dates = [
    Date.new(2014, 6, 30),
    Date.new(2014, 6, 19),
    Date.new(2014, 6, 18),
    Date.new(2014, 6, 7),
  ]
  histogram = Stats::WeeklyHistogram.new(dates, 4, today)
  assert_equal [25.0, 50.0, 0.0, 25.0], histogram.percentages
end

def test_cumulative_percentages
  today = Date.new(2014, 7, 1)
  dates = [
    Date.new(2014, 6, 30),
    Date.new(2014, 6, 19),
    Date.new(2014, 6, 18),
    Date.new(2014, 6, 7),
  ]
  histogram = Stats::WeeklyHistogram.new(dates, 4, today)
  assert_equal [25.0, 75.0, 75.0, 100.0], histogram.cumulative_percentages
end

We can crib the implementation directly from the Stats::RunningTimeData class to make the tests pass. Purists might want to re-implement everything from scratch here. I don’t really care one way or the other, as long as the result is readable and does what it needs to.

module Stats
  class WeeklyHistogram
    attr_reader :dates, :cutoff, :today
    def initialize(dates, cutoff, today=Date.today)
      @dates = dates
      @cutoff = cutoff
      @today = today
    end

    def counts
      frequencies = Array.new(length) {|i|
        datapoints_per_week[i].to_i
      }
      frequencies << datapoints_per_week.inject(:+) - frequencies.inject(:+)
    end

    def percentages
      counts.map(&percentage)
    end

    def cumulative_percentages
      running_total = 0
      percentages.map {|count| running_total += count}
    end

    def length
      [cutoff, total_weeks_of_data].min
    end

    private

    def percentage
      Proc.new {|count| (count * 100.0 / dates.count)}
    end

    def weeks_since(date)
      (today - date).to_i / 7
    end

    def total_weeks_of_data
      weeks_since(dates.last)
    end

    def datapoints_per_week
      frequencies = Array.new(total_weeks_of_data + 1, 0)
      dates.each {|date|
        frequencies[weeks_since(date)] += 1
      }
      frequencies
    end
  end
end

Is This Better?

Yep, it sure is.

It is loosely coupled: All it needs is the collection of dates to operate on.

It is highly cohesive: Everything in the class cares about those dates.

It is easily composable: The code can be used to create text-only reports in a stand-alone email program, or to generate stats for a Sinatra application or a command-line application.

It is context independent: It is not important where those dates come from. The context in the current application is TODOs. Unfinished TODOs, to be specific. But that’s incidental. Now we can use this in wildly different contexts: Cupcake sales. Zombie attacks. Celebrity sightings.

The Wall at Your Back

Over and over again, we make decisions. That’s our job, and most of the time we’re wrong. We can’t know the future. Our codebases reflect our guesses, our assumptions, and most of all, our mistakes.

Golden Master tests give us a way to change our mind when all the original reasons are long gone. When we no longer remember why we made the choices we did. When we’re no longer certain how the code does what it does. When we’re not even sure what the code does anymore.

It’s a path out when there is no obvious way forward, or when forward might be actually be backwards, round-about, and convoluted.

The Golden Master technique is not a tool for design or architecture or beauty. Golden makes it sound so fancy, so noble.

It isn’t.

It’s dirty, temporary, and utterly practical.