Golden Master Testing: More Refactoring, More Understanding
After a tedious slog through views and controllers, we finally reached the part of refactoring that becomes an adventure: Grasping at straws. Second guessing everything. Obsessively running the tests. Nurturing the delicate hope that our new-found understanding is, indeed, insight and not just another failed assumption.
In the previous article, we dissected the first line of the compute
method, tracing that line back carefully and meticulously. We discovered that the charts are concerned with dates, most likely on some sort of weekly basis, for the past year.
Next we will untangle the code that processes the raw data into the correct format for the chart to consume.
Once Upon a Time
module Stats
class RunningTimeData
include Rails.application.routes.url_helpers
attr_reader :title, :dates, :key, :today,
:y_legend, :y2_legend, :x_legend, :values,
:links, :values_2, :x_labels, :y_max
def initialize(title, timestamps, key, now)
@title = title
@dates = timestamps.map(&:to_date)
@key = key
@today = now.to_date
end
def compute
@actions_running_per_week_array = convert_to_weeks_from_today_array
# cut off chart at 52 weeks = one year
@count = [52, total_weeks_of_data].min
# convert to new array to hold max @cut_off elems + 1 for sum of actions after @cut_off
@actions_running_time_array = cut_off_array_with_sum(@actions_running_per_week_array, @count)
@max_actions = @actions_running_time_array.max
# get percentage done cumulative
@cumulative_percent_done = convert_to_cumulative_array(@actions_running_time_array, dates.count )
@url_labels = Array.new(@count){ |i| url_for(:controller => 'stats', :action => 'show_selected_actions_from_chart', :index => i, :id=> @key, :only_path => true) }
@url_labels[@count]=url_for(:controller => 'stats', :action => 'show_selected_actions_from_chart', :index => @count, :id=> "#{@key}_end", :only_path => true)
@time_labels = Array.new(@count){ |i| "#{i}-#{i+1}" }
@time_labels[0] = "< 1"
@time_labels[@count] = "> #{@count}"
# Normalize all the variable names
@y_legend = I18n.t('stats.running_time_legend.actions')
@y2_legend = I18n.t('stats.running_time_legend.percentage')
@x_legend = I18n.t('stats.running_time_legend.weeks')
@values = @actions_running_time_array.join(",")
@links = @url_labels.join(",")
@values_2 = @cumulative_percent_done.join(",")
@x_labels = @time_labels.join(",")
# add one to @max for people who have no actions completed yet.
# OpenFlashChart cannot handle y_max=0
@y_max = 1+@max_actions+@max_actions/10
end
def total_weeks_of_data
weeks_since(dates.last)
end
def weeks_since(date)
days_since(date) / 7
end
def days_since(date)
(today - date).to_i
end
def convert_to_weeks_from_today_array
convert_to_array(dates, total_weeks_of_data+1) {|date|
[weeks_since(date)]
}
end
# uses the supplied block to determine array of indexes in hash
# the block should return an array of indexes each is added to the hash and summed
def convert_to_array(records, upper_bound)
a = Array.new(upper_bound, 0)
records.each { |r| (yield r).each { |i| a[i] += 1 if a[i] } }
a
end
# returns a new array containing all elems of array up to cut_off and
# adds the sum of the rest of array to the last elem
def cut_off_array_with_sum(array, cut_off)
# +1 to hold sum of rest
a = Array.new(cut_off+1){|i| array[i]||0}
# add rest of array to last elem
a[cut_off] += array.inject(:+) - a.inject(:+)
return a
end
def convert_to_cumulative_array(array, max)
# calculate fractions
a = Array.new(array.size){|i| array[i]*100.0/max}
# make cumulative
1.upto(array.size-1){ |i| a[i] += a[i-1] }
return a
end
end
end
And so the story begins, with a variable named @actions_running_per_week_array
.
Actions running is something that we’ve seen before, and it can be translated into unfinished TODOs.
The thing is, we don’t care that they’re TODOs. We extracted the created_at
timestamps from the TODOs, coerced them into dates, and that’s all the chart is interested in. Simple datapoints.
It also seems unnecessary to encode the type name into the name.
def datapoints_per_week
convert_to_weeks_from_today_array(dates, total_weeks_of_data+1)
end
convert_to_weeks_from_today_array
takes two arguments, both of which are available in the entire class. There is no need to pass either of them. Taking them off leaves us with a spurious abstraction:
def datapoints_per_week
convert_to_weeks_from_today_array
end
Inlining the method results in a vague sense of déjà vu.
def datapoints_per_week
convert_to_array(dates, total_weeks_of_data + 1) {|date|
[weeks_since(date)]
}
end
The entire method body is a call to another method, which takes arguments that are available to the whole class.
Inline it!
Step back for a moment and think about the methods that we just inlined. It started out like this:
def datapoints_per_week
convert_to_weeks_from_today_array
end
def convert_to_weeks_from_today_array
convert_to_array
end
def convert_to_array
# several lines of complicated logic
end
If a method definition contains a single line, and that line is just another method call, then the two method names better be worth it.
Contrast the above with the following method, which we defined in the course of the previous refactoring.
def total_weeks_of_data
weeks_since(dates.last)
end
In the latter example both names are meaningful. They’re at different levels of abstraction, and together they tell a good story.
Discovering Datapoints
Let’s poke at datapoints_per_week
with our test_stuff
test, to figure out how the data is structured.
def test_stuff
now = Time.utc(2014, 1, 2, 3, 4, 5)
timestamps = [
now - 5.minutes,
now - 1.day,
now - 7.days,
now - 8.days,
now - 8.days,
now - 30.days,
]
stats = Stats::RunningTimeData.new("title", timestamps, "key", now)
assert_equal "something", stats.datapoints_per_week
end
The spread of data here goes from just a few minutes ago, to a month ago. I’m particularly interested in figuring out what happens in a week without any data.
The failure looks like this:
Expected: "something"
Actual: [2, 3, 0, 1]
The index of the array represents the number of weeks ago, and the value is the number of datapoints that occurred that week.
Here’s how the array of datapoints is produced:
def datapoints_per_week
a = Array.new(total_weeks_of_data+1, 0)
dates.each {|date|
[weeks_since(date)].each {|i|
a[i] += 1 if a[i]
}
}
a
end
Shield your eyes, children, we’ve got nested iteration!
The inner loop iterates over an array of one element. That seems a bit pointless. All we need is an index:
dates.each {|date|
i = weeks_since(date)
a[i] += 1 if a[i]
}
There is no way in which weeks_since
will ever return nil
or false
, so we can delete the postfix if
. While we’re at it, let’s rename a
to frequencies
.
def datapoints_per_week
frequencies = Array.new(total_weeks_of_data + 1, 0)
dates.each {|date|
frequencies[weeks_since(date)] += 1
}
frequencies
end
The datapoints have been clustered by week. Next it looks like we’re going to truncate it.
# cut off chart at 52 weeks = one year
@count = [52, total_weeks_of_data].min
The comment suggests that we only want to show data back to a certain date. That kind of makes sense, but then you might wonder why the SQL query didn’t simply limit the data based on some cutoff date.
Perhaps there’s more to it.
If we look at how @count
is used, it gets even more confusing.
# convert to new array to hold max @cut_off elems + 1
# for sum of actions after @cut_off
@actions_running_time_array = cut_off_array_with_sum(datapoints_per_week, @count)
Assuming that @cut_off
actually means @count
, this comment contradicts the previous one.
Either it’s cutting the data off at a year, or it’s summing up the dates outside the cutoff as a single entry. I don’t see how it could be both.
We need to poke at it to see what’s actually going on.
def test_stuff
chart = Stats::RunningTimeData.new("title", [], "key", Time.now)
per_week = [1, 0, 3, 4, 5]
assert_equal [], chart.cut_off_array_with_sum(per_week, 1)
assert_equal [], chart.cut_off_array_with_sum(per_week, 2)
assert_equal [], chart.cut_off_array_with_sum(per_week, 3)
assert_equal [], chart.cut_off_array_with_sum(per_week, 4)
assert_equal [], chart.cut_off_array_with_sum(per_week, 5)
assert_equal [], chart.cut_off_array_with_sum(per_week, 6)
end
One at a time, the failures fill in the blanks.
# datapoints_per_week = [1, 0, 3, 4, 5]
[1, 12] # cut off at 1
[1, 0, 12] # cut off at 2
[1, 0, 3, 9] # cut off at 3
[1, 0, 3, 4, 5] # cut off at 4
[1, 0, 3, 4, 5, 0] # cut off at 5
[1, 0, 3, 4, 5, 0, 0] # cut off at 6
The cutoff method doesn’t discard data, it batches all the overflow into a single slot. Moreover, it also fills in any empty slots within the desired time period with 0.
@count
seems like an overly generic name. What it really tells us, is exactly how many weeks of data will be displayed in the chart, regardless of how many weeks of data are available in the raw data.
Renaming count
to total_weeks_in_chart
, and actions_running_time_array
to datapoints_per_week_in_chart
tells a slightly more coherent story. This is verbose, but until we have teased out all of the concepts, it’s going to be hard to pick a better name.
def datapoints_per_week_in_chart
cut_off_array_with_sum(datapoints_per_week, total_weeks_in_chart)
end
Notice the familiar pattern: The arguments are globally available, and the method definition consists of a single call to another method.
Inlining cut_off_array_with_sum
gives us:
def datapoints_per_week_in_chart
a = Array.new(total_weeks_in_chart + 1) { |i| datapoints_per_week[i]||0 }
a[total_weeks_in_chart] += datapoints_per_week.inject(:+) - a.inject(:+)
a
end
This looks a bit scary, but can be simplified somewhat:
def datapoints_per_week_in_chart
frequencies = Array.new(total_weeks_in_chart) {|i|
datapoints_per_week[i].to_i
}
frequencies << datapoints_per_week.inject(:+) - frequencies.inject(:+)
end
Moving on, the code is introducing a completely new term: cumulative_percent_done
.
I’m not really sure that done is the word that we are looking for. We’re dealing with unfinished TODOs, not completed TODOs. And even so, we don’t care.
Let’s rename it to cumulative_percentages
:
def cumulative_percentages
convert_to_cumulative_array(datapoints_per_week, dates.count)
end
As usual, we can inline the contained method:
def cumulative_percentages
a = Array.new(datapoints_per_week_in_chart.size) {|i|
datapoints_per_week_in_chart[i]*100.0/dates.count
}
1.upto(timestamp_counts.size-1) {|i| a[i] += a[i-1]}
a
end
That looks more than a little frightning. It’s doing a lot.
It would be helpful to name the concepts, so let’s extract some methods.
The first part is creating an array of percentages by week.
def percentages_per_week
Array.new(datapoints_per_week_in_chart.size) {|i|
datapoints_per_week_in_chart[i]*100.0/dates.count
}
end
Think about what this is doing.
- It is creating a new array that is the same length as an existing array, and
- It is deriving each value in the array from the corresponding value in the original array.
Another word for this is map.
def percentages_per_week
datapoints_per_week_in_chart.map {|count|
count * 100.0 / dates.count
}
end
This method is doing quite a bit, as well. Naming the block cleans it up considerably.
def percentages_per_week
datapoints_per_week_in_chart.map(&percentage)
end
def percentage
Proc.new {|count| count * 100.0 / dates.count}
end
The fully refactored cumulative_percentages
logic looks like this:
def cumulative_percentages
running_total = 0
percentages_per_week.map {|percentage|
running_total += percentage
}
end
That’s not nearly as frightening as it seemed just a moment ago.
Cleanup
The rest of compute
can be blown apart with simple extractions. The resulting code can be seen below.
It is even longer than the original, but the concepts, originally hidden behind cryptic names at odd levels of abstraction, have now been brought to the surface. Are the names right? Probably not. This was not a refactoring to create a good arrangement of the code, it was a refactoring to discover what is there.
Inline, inline, extract, inline, extract, extract.
This is the rhythm of refactoring when codebase has not yet found the right abstractions.
The result of this refactoring is understanding, not good code.
For that, we’re going to take one final pass at this. We will ask the question if this were two things, what would they be? and out of the ashes if this refactoring, a tiny, cohesive abstraction will appear.
module Stats
class RunningTimeData
include Rails.application.routes.url_helpers
attr_reader :title, :dates, :key, :today
def initialize(title, timestamps, key, now)
@title = title
@dates = timestamps.map(&:to_date)
@key = key
@today = now.to_date
end
def compute
# FIXME: delete call from controllers
end
def y_legend
I18n.t('stats.running_time_legend.actions')
end
def y2_legend
I18n.t('stats.running_time_legend.percentage')
end
def x_legend
I18n.t('stats.running_time_legend.weeks')
end
def values
datapoints_per_week_in_chart.join(",")
end
def links
url_labels.join(",")
end
def values_2
cumulative_percentages.join(",")
end
def x_labels
time_labels.join(",")
end
def y_max
# add one to @max for people who have no actions completed yet.
# OpenFlashChart cannot handle y_max=0
1 + datapoints_per_week_in_chart.max + datapoints_per_week_in_chart.max/10
end
private
def url_labels
urls = Array.new(total_weeks_in_chart) {|i|
url(i, key)
}
urls << url(total_weeks_in_chart, "#{key}_end")
end
def url(index, id)
options = {
:controller => 'stats',
:action => 'show_selected_actions_from_chart',
:index => index,
:id=> id,
:only_path => true
}
url_for(options)
end
def time_labels
labels = Array.new(total_weeks_in_chart) {|i|
"#{i}-#{i+1}"
}
labels[0] = "< 1"
labels[total_weeks_in_chart] = "> #{total_weeks_in_chart}"
labels
end
def total_weeks_in_chart
[52, total_weeks_of_data].min
end
def total_weeks_of_data
weeks_since(dates.last)
end
def weeks_since(date)
(today - date).to_i / 7
end
def cumulative_percentages
running_total = 0
percentages_per_week.map {|count| running_total += count}
end
def percentages_per_week
datapoints_per_week_in_chart.map(&percentage)
end
def percentage
Proc.new {|count| (count * 100.0 / dates.count)}
end
def datapoints_per_week_in_chart
frequencies = Array.new(total_weeks_in_chart) {|i|
datapoints_per_week[i].to_i
}
frequencies << datapoints_per_week.inject(:+) - frequencies.inject(:+)
end
def datapoints_per_week
frequencies = Array.new(total_weeks_of_data + 1, 0)
dates.each {|date|
frequencies[weeks_since(date)] += 1
}
frequencies
end
end
end