Golden Master Testing: Refactoring for Understanding

Katrina Owen
Share

Robot of the metal parts on a dark grungy background

Sometimes the human body engages in a seemingly pointless activity called “futile cycling”, where two metabolic pathways run simultaneously in opposite directions and have no overall effect other than to dissipate energy in the form of heat (wikipedia).

The refactorings performed in the previous two articles can seem a lot like futile cycling. We have nearly 100 lines of tangled code, but it was a lot of work for no apparent overall effect.

The current state of affairs can be observed in the model layer, isolated more or less where it belongs in the model layer:

module Stats
  class RunningTimeData
    include Rails.application.routes.url_helpers

    SECONDS_PER_DAY = 86400;

    attr_reader :title, :actions_running_time, :key, :today,
                :y_legend, :y2_legend, :x_legend, :values,
                :values_2, :links, :x_labels, :y_max

    def initialize(title, actions_running_time, key, today)
      @title = title
      @actions_running_time = actions_running_time
      @key = key
      @today = today
    end

    def compute
      # convert to array and fill in non-existing weeks with 0
      @max_weeks = difference_in_weeks(@today, @actions_running_time.last.created_at)
      @actions_running_per_week_array = convert_to_weeks_from_today_array(@actions_running_time, @max_weeks+1, :created_at)

      # cut off chart at 52 weeks = one year
      @count = [52, @max_weeks].min

      # convert to new array to hold max @cut_off elems + 1 for sum of actions after @cut_off
      @actions_running_time_array = cut_off_array_with_sum(@actions_running_per_week_array, @count)
      @max_actions = @actions_running_time_array.max

      # get percentage done cumulative
      @cumulative_percent_done = convert_to_cumulative_array(@actions_running_time_array, @actions_running_time.count )

      @url_labels = Array.new(@count){ |i| url_for(:controller => 'stats', :action => 'show_selected_actions_from_chart', :index => i, :id=> @key, :only_path => true) }
      @url_labels[@count]=url_for(:controller => 'stats', :action => 'show_selected_actions_from_chart', :index => @count, :id=> "#{@key}_end", :only_path => true)

      @time_labels         = Array.new(@count){ |i| "#{i}-#{i+1}" }
      @time_labels[0] = "< 1"
      @time_labels[@count] = "> #{@count}"

      # Normalize all the variable names
      @y_legend = I18n.t('stats.running_time_legend.actions')
      @y2_legend = I18n.t('stats.running_time_legend.percentage')
      @x_legend = I18n.t('stats.running_time_legend.weeks')
      @values = @actions_running_time_array.join(",")
      @links = @url_labels.join(",")
      @values_2 = @cumulative_percent_done.join(",")
      @x_labels = @time_labels.join(",")
      # add one to @max for people who have no actions completed yet.
      # OpenFlashChart cannot handle y_max=0
      @y_max = 1+@max_actions+@max_actions/10
    end

    # assumes date1 > date2
    def difference_in_weeks(date1, date2)
      return difference_in_days(date1, date2) / 7
    end

    # assumes date1 > date2
    def difference_in_days(date1, date2)
      return ((date1.utc.at_midnight-date2.utc.at_midnight)/SECONDS_PER_DAY).to_i
    end

    def convert_to_weeks_from_today_array(records, array_size, date_method_on_todo)
      return convert_to_array(records, array_size) { |r| [difference_in_weeks(@today, r.send(date_method_on_todo))]}
    end

    # uses the supplied block to determine array of indexes in hash
    # the block should return an array of indexes each is added to the hash and summed
    def convert_to_array(records, upper_bound)
      a = Array.new(upper_bound, 0)
      records.each { |r| (yield r).each { |i| a[i] += 1 if a[i] } }
      a
    end

    # returns a new array containing all elems of array up to cut_off and
    # adds the sum of the rest of array to the last elem
    def cut_off_array_with_sum(array, cut_off)
      # +1 to hold sum of rest
      a = Array.new(cut_off+1){|i| array[i]||0}
      # add rest of array to last elem
      a[cut_off] += array.inject(:+) - a.inject(:+)
      return a
    end

    def convert_to_cumulative_array(array, max)
      # calculate fractions
      a = Array.new(array.size){|i| array[i]*100.0/max}
      # make cumulative
      1.upto(array.size-1){ |i| a[i] += a[i-1] }
      return a
    end
  end
end

Intimidating, isn’t it?

Shoveling code from one place to another isn’t difficult. With a golden master test asserting that all good, nothing changed, it’s not even particularly scary, merely tedious. But we can no longer simply shovel. The only way forward is to tease the code apart.

To do that, we need to understand it, and we’re going to understand it by teasing it apart.

Here’s the tentative plan: Rename stuff and ask a lot of questions.

What is this? What does it represent? What does the data-structure look like? How is it used? Why do we care? Why is the sky blue? Are we there yet?

Take it From The Top

The real meat of the problem starts in compute. Here’s the first line:

@max_weeks = difference_in_weeks(@today, @actions_running_time.last.created_at)

The only thing I can confidently say that I understand is @today.

This is preceded by a comment, convert to array and fill in non-existing weeks with 0, but what does it even mean? Convert what to an array? An array of what?

Well, duh, the @actions_running_time, of course. Whatever that is.

It turns out that @actions_running_time is a collection of unfinished TODOs. We could name it @unfinished_todos, but it’s unclear whether or not this makes sense within this context.

Shaving Off Unnecessary Context

Tracing through the code reveals that we only ever send created_at to the TODOs.

TODOs are at the core of the domain. The Todo model is 420 lines long, it has a lots of data and lots of relations. Yet here we don’t care about any of that. It doesn’t matter to this code that the TODOs are unfinished.

To be honest, it probably doesn’t even matter that they’re TODOs.

In other words, unfinished_todos is not the name we are looking for, because neither unfinished nor todo is a meaningful distinction in this case. We just care about the timestamp.

Ideally we shouldn’t even pass the Active Record objects into Stats::RunningTimeData.

The controllers can do a map(&:created_at) before sending the data into the Stats::RunningTimeData.

Making the change is straightforward–no gotchas, no surprises.

Several nice things happen. We can forget about the complexity of unfinished todos. There’s less context, less to understand, and the Stats object becomes easier to reason about. It also makes it easy to name the collection. What is it? timestamps!

A side benefit is that we can now easily construct simple automated tests that exercise various parts of the Stats object directly.

def test_stuff
  today = Time.utc(2014, 1, 2, 3, 4, 5)
  timestamps = [
    # ...
  ]
  stats = Stats::RunningTimeData.new("title", timestamps, "key", today)
  # assert_equal "whatever", stats.something
end

This isn’t a unit test. It’s more like ephemeral characterization test whose sole purpose is to explore some output.

Take it From the Top (Again)

I’m still curious what @max_weeks is about. Extracting it into a method allows us to poke at it with a test, seeing what the return value is.

Trying a few different values shows that max_weeks is a count of how many weeks worth of unfinished todos we have. Or, if you will, it’s how many weeks old the oldest unfinished todo is.

A better name might be total_weeks_of_data.

def total_weeks_of_data
  difference_in_weeks(today, timestamps.last)
end

The name difference_in_weeks seems self-explanatory in hindsight, but the method itself tickles my spidey sense.

# assumes date1 > date2
def difference_in_weeks(date1, date2)
  return difference_in_days(date1, date2) / 7
end

The date1 value corresponds to today, and date2 is the oldest timestamp. These are all timestamps, but they’re named after dates.

Some judicious renaming leaves us with now instead of today, and the following method definition:

# assumes timestamp1 > timestamp2
def difference_in_weeks(timestamp1, timestamp2)
  difference_in_days(timestamp1, timestamp2) / 7
end

But there’s more.

timestamp1, which gets passed as an argument, is always now, which is available to the whole class. There’s no reason to pass it as an argument.

def difference_in_weeks(timestamp)
  difference_in_days(now, timestamp) / 7
end

This is getting weird. A diff with one argument makes no sense. It’s the sound of one hand clapping.

A better name would be weeks_since(timestamp).

It seems likely that we can make similar changes to difference_in_days().

# assumes date1 > date2
def difference_in_days(date1, date2)
  return ((date1.utc.at_midnight-date2.utc.at_midnight)/SECONDS_PER_DAY).to_i
end

Huh. at_midnight. This is oddly recursive. The variables are named like dates, but they’re really timestamps, but what the method really cares about is dates.

In Ruby, the 1 in date + 1 is worth one day, whereas the 1 in timestamp + 1 is worth one second.

Coercing the values in initialize and renaming them again! gets rid of all the seconds-wrangling, vastly simplifying the method:

def days_since(date)
  (today - date).to_i
end

The days_since method is used only in one place, so we may as well inline it:

def weeks_since(date)
  (today - date).to_i / 7
end

As a bonus, the SECONDS_PER_DAY just became superfluous.

Are We There Yet?

Well, no. We haven’t even looked at line 2 of the compute method yet, but there’s already a significant improvement.

Here’s what we started with:

class RunningTimeData
  # ...
  SECONDS_PER_DAY = 86400;

  def compute
    @max_weeks = difference_in_weeks(@today, @actions_running_time.last.created_at)
    # 35 more lines
  end

  # assumes date1 > date2
  def difference_in_weeks(date1, date2)
    return difference_in_days(date1, date2) / 7
  end

  # assumes date1 > date2
  def difference_in_days(date1, date2)
    return ((date1.utc.at_midnight-date2.utc.at_midnight)/SECONDS_PER_DAY).to_i
  end
  # ...
end

And after following the the rabbit hole all the way down, we ended up with:

class RunningTimeData
  # ...
  def compute
    # 35 lines
  end

  def total_weeks_of_data
    weeks_since(dates.last)
  end

  def weeks_since(date)
    (today - date).to_i / 7
  end
  # ...
end

The refactoring techniques remove parameter and inline method helped strip away cruft and confusion, allowing concepts to emerge. Refactoring using extract method, rename method, and rename variable encoded the exposed concepts into the codebase itself. (Note: You can find a catalog of refactoring techniques here.)

We started with a a muddy idea named actions running time. We simplified the code, and the concept of timestamps appeared. Once the fog of Time and Date had cleared, the central concept was simply dates.

This simplification allowed us to easily create ephemeral characterization tests, a powerful tool for exploring meaning where it has been obscured by complexity.

There’s still no real sense of what the entire Stats::RunningTimeData is about yet. For that we will need to follow a few more rabbit trails that will lead to discovering more concepts. Stay tuned for the next installment, where clarity increases by leaps and bounds.