Golden Master Testing: More Refactoring, More Understanding

    Katrina Owen
    Share

    Robot of the metal parts on a dark grungy background

    After a tedious slog through views and controllers, we finally reached the part of refactoring that becomes an adventure: Grasping at straws. Second guessing everything. Obsessively running the tests. Nurturing the delicate hope that our new-found understanding is, indeed, insight and not just another failed assumption.

    In the previous article, we dissected the first line of the compute method, tracing that line back carefully and meticulously. We discovered that the charts are concerned with dates, most likely on some sort of weekly basis, for the past year.

    Next we will untangle the code that processes the raw data into the correct format for the chart to consume.

    Once Upon a Time

    module Stats
      class RunningTimeData
        include Rails.application.routes.url_helpers
    
        attr_reader :title, :dates, :key, :today,
                    :y_legend, :y2_legend, :x_legend, :values,
                    :links, :values_2, :x_labels, :y_max
    
        def initialize(title, timestamps, key, now)
          @title = title
          @dates = timestamps.map(&:to_date)
          @key = key
          @today = now.to_date
        end
    
        def compute
          @actions_running_per_week_array = convert_to_weeks_from_today_array
    
          # cut off chart at 52 weeks = one year
          @count = [52, total_weeks_of_data].min
    
          # convert to new array to hold max @cut_off elems + 1 for sum of actions after @cut_off
          @actions_running_time_array = cut_off_array_with_sum(@actions_running_per_week_array, @count)
          @max_actions = @actions_running_time_array.max
    
          # get percentage done cumulative
          @cumulative_percent_done = convert_to_cumulative_array(@actions_running_time_array, dates.count )
    
          @url_labels = Array.new(@count){ |i| url_for(:controller => 'stats', :action => 'show_selected_actions_from_chart', :index => i, :id=> @key, :only_path => true) }
          @url_labels[@count]=url_for(:controller => 'stats', :action => 'show_selected_actions_from_chart', :index => @count, :id=> "#{@key}_end", :only_path => true)
    
          @time_labels         = Array.new(@count){ |i| "#{i}-#{i+1}" }
          @time_labels[0] = "< 1"
          @time_labels[@count] = "> #{@count}"
    
          # Normalize all the variable names
          @y_legend = I18n.t('stats.running_time_legend.actions')
          @y2_legend = I18n.t('stats.running_time_legend.percentage')
          @x_legend = I18n.t('stats.running_time_legend.weeks')
          @values = @actions_running_time_array.join(",")
          @links = @url_labels.join(",")
          @values_2 = @cumulative_percent_done.join(",")
          @x_labels = @time_labels.join(",")
          # add one to @max for people who have no actions completed yet.
          # OpenFlashChart cannot handle y_max=0
          @y_max = 1+@max_actions+@max_actions/10
        end
    
        def total_weeks_of_data
          weeks_since(dates.last)
        end
    
        def weeks_since(date)
          days_since(date) / 7
        end
    
        def days_since(date)
          (today - date).to_i
        end
    
        def convert_to_weeks_from_today_array
          convert_to_array(dates, total_weeks_of_data+1) {|date|
            [weeks_since(date)]
          }
        end
    
        # uses the supplied block to determine array of indexes in hash
        # the block should return an array of indexes each is added to the hash and summed
        def convert_to_array(records, upper_bound)
          a = Array.new(upper_bound, 0)
          records.each { |r| (yield r).each { |i| a[i] += 1 if a[i] } }
          a
        end
    
        # returns a new array containing all elems of array up to cut_off and
        # adds the sum of the rest of array to the last elem
        def cut_off_array_with_sum(array, cut_off)
          # +1 to hold sum of rest
          a = Array.new(cut_off+1){|i| array[i]||0}
          # add rest of array to last elem
          a[cut_off] += array.inject(:+) - a.inject(:+)
          return a
        end
    
        def convert_to_cumulative_array(array, max)
          # calculate fractions
          a = Array.new(array.size){|i| array[i]*100.0/max}
          # make cumulative
          1.upto(array.size-1){ |i| a[i] += a[i-1] }
          return a
        end
      end
    end

    And so the story begins, with a variable named @actions_running_per_week_array.

    Actions running is something that we’ve seen before, and it can be translated into unfinished TODOs.

    The thing is, we don’t care that they’re TODOs. We extracted the created_at timestamps from the TODOs, coerced them into dates, and that’s all the chart is interested in. Simple datapoints.

    It also seems unnecessary to encode the type name into the name.

    def datapoints_per_week
      convert_to_weeks_from_today_array(dates, total_weeks_of_data+1)
    end

    convert_to_weeks_from_today_array takes two arguments, both of which are available in the entire class. There is no need to pass either of them. Taking them off leaves us with a spurious abstraction:

    def datapoints_per_week
      convert_to_weeks_from_today_array
    end

    Inlining the method results in a vague sense of déjà vu.

    def datapoints_per_week
      convert_to_array(dates, total_weeks_of_data + 1) {|date|
        [weeks_since(date)]
      }
    end

    The entire method body is a call to another method, which takes arguments that are available to the whole class.

    Inline it!

    Step back for a moment and think about the methods that we just inlined. It started out like this:

    def datapoints_per_week
      convert_to_weeks_from_today_array
    end
    
    def convert_to_weeks_from_today_array
      convert_to_array
    end
    
    def convert_to_array
      # several lines of complicated logic
    end

    If a method definition contains a single line, and that line is just another method call, then the two method names better be worth it.

    Contrast the above with the following method, which we defined in the course of the previous refactoring.

    def total_weeks_of_data
      weeks_since(dates.last)
    end

    In the latter example both names are meaningful. They’re at different levels of abstraction, and together they tell a good story.

    Discovering Datapoints

    Let’s poke at datapoints_per_week with our test_stuff test, to figure out how the data is structured.

    def test_stuff
      now = Time.utc(2014, 1, 2, 3, 4, 5)
      timestamps = [
        now - 5.minutes,
        now - 1.day,
        now - 7.days,
        now - 8.days,
        now - 8.days,
        now - 30.days,
      ]
      stats = Stats::RunningTimeData.new("title", timestamps, "key", now)
      assert_equal "something", stats.datapoints_per_week
    end

    The spread of data here goes from just a few minutes ago, to a month ago. I’m particularly interested in figuring out what happens in a week without any data.

    The failure looks like this:

    Expected: "something"
      Actual: [2, 3, 0, 1]

    The index of the array represents the number of weeks ago, and the value is the number of datapoints that occurred that week.

    Here’s how the array of datapoints is produced:

    def datapoints_per_week
      a = Array.new(total_weeks_of_data+1, 0)
      dates.each {|date|
        [weeks_since(date)].each {|i|
          a[i] += 1 if a[i]
        }
      }
      a
    end

    Shield your eyes, children, we’ve got nested iteration!

    The inner loop iterates over an array of one element. That seems a bit pointless. All we need is an index:

    dates.each {|date|
      i = weeks_since(date)
      a[i] += 1 if a[i]
    }

    There is no way in which weeks_since will ever return nil or false, so we can delete the postfix if. While we’re at it, let’s rename a to frequencies.

    def datapoints_per_week
      frequencies = Array.new(total_weeks_of_data + 1, 0)
      dates.each {|date|
        frequencies[weeks_since(date)] += 1
      }
      frequencies
    end

    The datapoints have been clustered by week. Next it looks like we’re going to truncate it.

    # cut off chart at 52 weeks = one year
    @count = [52, total_weeks_of_data].min

    The comment suggests that we only want to show data back to a certain date. That kind of makes sense, but then you might wonder why the SQL query didn’t simply limit the data based on some cutoff date.

    Perhaps there’s more to it.

    If we look at how @count is used, it gets even more confusing.

    # convert to new array to hold max @cut_off elems + 1
    # for sum of actions after @cut_off
    @actions_running_time_array = cut_off_array_with_sum(datapoints_per_week, @count)

    Assuming that @cut_off actually means @count, this comment contradicts the previous one.

    Either it’s cutting the data off at a year, or it’s summing up the dates outside the cutoff as a single entry. I don’t see how it could be both.

    We need to poke at it to see what’s actually going on.

    def test_stuff
      chart = Stats::RunningTimeData.new("title", [], "key", Time.now)
      per_week = [1, 0, 3, 4, 5]
      assert_equal [], chart.cut_off_array_with_sum(per_week, 1)
      assert_equal [], chart.cut_off_array_with_sum(per_week, 2)
      assert_equal [], chart.cut_off_array_with_sum(per_week, 3)
      assert_equal [], chart.cut_off_array_with_sum(per_week, 4)
      assert_equal [], chart.cut_off_array_with_sum(per_week, 5)
      assert_equal [], chart.cut_off_array_with_sum(per_week, 6)
    end

    One at a time, the failures fill in the blanks.

    # datapoints_per_week = [1, 0, 3, 4, 5]
    [1, 12] # cut off at 1
    [1, 0, 12] # cut off at 2
    [1, 0, 3, 9] # cut off at 3
    [1, 0, 3, 4, 5] # cut off at 4
    [1, 0, 3, 4, 5, 0] # cut off at 5
    [1, 0, 3, 4, 5, 0, 0] # cut off at 6

    The cutoff method doesn’t discard data, it batches all the overflow into a single slot. Moreover, it also fills in any empty slots within the desired time period with 0.

    @count seems like an overly generic name. What it really tells us, is exactly how many weeks of data will be displayed in the chart, regardless of how many weeks of data are available in the raw data.

    Renaming count to total_weeks_in_chart, and actions_running_time_array to datapoints_per_week_in_chart tells a slightly more coherent story. This is verbose, but until we have teased out all of the concepts, it’s going to be hard to pick a better name.

    def datapoints_per_week_in_chart
      cut_off_array_with_sum(datapoints_per_week, total_weeks_in_chart)
    end

    Notice the familiar pattern: The arguments are globally available, and the method definition consists of a single call to another method.

    Inlining cut_off_array_with_sum gives us:

    def datapoints_per_week_in_chart
      a = Array.new(total_weeks_in_chart + 1) { |i| datapoints_per_week[i]||0 }
      a[total_weeks_in_chart] += datapoints_per_week.inject(:+) - a.inject(:+)
      a
    end

    This looks a bit scary, but can be simplified somewhat:

    def datapoints_per_week_in_chart
      frequencies = Array.new(total_weeks_in_chart) {|i|
        datapoints_per_week[i].to_i
      }
      frequencies << datapoints_per_week.inject(:+) - frequencies.inject(:+)
    end

    Moving on, the code is introducing a completely new term: cumulative_percent_done.

    I’m not really sure that done is the word that we are looking for. We’re dealing with unfinished TODOs, not completed TODOs. And even so, we don’t care.

    Let’s rename it to cumulative_percentages:

    def cumulative_percentages
      convert_to_cumulative_array(datapoints_per_week, dates.count)
    end

    As usual, we can inline the contained method:

    def cumulative_percentages
      a = Array.new(datapoints_per_week_in_chart.size) {|i|
        datapoints_per_week_in_chart[i]*100.0/dates.count
      }
      1.upto(timestamp_counts.size-1) {|i| a[i] += a[i-1]}
      a
    end

    That looks more than a little frightning. It’s doing a lot.

    It would be helpful to name the concepts, so let’s extract some methods.

    The first part is creating an array of percentages by week.

    def percentages_per_week
      Array.new(datapoints_per_week_in_chart.size) {|i|
        datapoints_per_week_in_chart[i]*100.0/dates.count
      }
    end

    Think about what this is doing.

    1. It is creating a new array that is the same length as an existing array, and
    2. It is deriving each value in the array from the corresponding value in the original array.

    Another word for this is map.

    def percentages_per_week
      datapoints_per_week_in_chart.map {|count|
        count * 100.0 / dates.count
      }
    end

    This method is doing quite a bit, as well. Naming the block cleans it up considerably.

    def percentages_per_week
      datapoints_per_week_in_chart.map(&percentage)
    end
    
    def percentage
      Proc.new {|count| count * 100.0 / dates.count}
    end

    The fully refactored cumulative_percentages logic looks like this:

    def cumulative_percentages
      running_total = 0
      percentages_per_week.map {|percentage|
        running_total += percentage
      }
    end

    That’s not nearly as frightening as it seemed just a moment ago.

    Cleanup

    The rest of compute can be blown apart with simple extractions. The resulting code can be seen below.

    It is even longer than the original, but the concepts, originally hidden behind cryptic names at odd levels of abstraction, have now been brought to the surface. Are the names right? Probably not. This was not a refactoring to create a good arrangement of the code, it was a refactoring to discover what is there.

    Inline, inline, extract, inline, extract, extract.

    This is the rhythm of refactoring when codebase has not yet found the right abstractions.

    The result of this refactoring is understanding, not good code.

    For that, we’re going to take one final pass at this. We will ask the question if this were two things, what would they be? and out of the ashes if this refactoring, a tiny, cohesive abstraction will appear.


    module Stats
      class RunningTimeData
        include Rails.application.routes.url_helpers
    
        attr_reader :title, :dates, :key, :today
    
        def initialize(title, timestamps, key, now)
          @title = title
          @dates = timestamps.map(&:to_date)
          @key = key
          @today = now.to_date
        end
    
        def compute
          # FIXME: delete call from controllers
        end
    
        def y_legend
          I18n.t('stats.running_time_legend.actions')
        end
    
        def y2_legend
          I18n.t('stats.running_time_legend.percentage')
        end
    
        def x_legend
          I18n.t('stats.running_time_legend.weeks')
        end
    
        def values
          datapoints_per_week_in_chart.join(",")
        end
    
        def links
          url_labels.join(",")
        end
    
        def values_2
          cumulative_percentages.join(",")
        end
    
        def x_labels
          time_labels.join(",")
        end
    
        def y_max
          # add one to @max for people who have no actions completed yet.
          # OpenFlashChart cannot handle y_max=0
          1 + datapoints_per_week_in_chart.max + datapoints_per_week_in_chart.max/10
        end
    
        private
    
        def url_labels
          urls = Array.new(total_weeks_in_chart) {|i|
            url(i, key)
          }
          urls << url(total_weeks_in_chart, "#{key}_end")
        end
    
        def url(index, id)
          options = {
            :controller => 'stats',
            :action => 'show_selected_actions_from_chart',
            :index => index,
            :id=> id,
            :only_path => true
          }
          url_for(options)
        end
    
        def time_labels
          labels = Array.new(total_weeks_in_chart) {|i|
            "#{i}-#{i+1}"
          }
          labels[0] = "< 1"
          labels[total_weeks_in_chart] = "> #{total_weeks_in_chart}"
          labels
        end
    
        def total_weeks_in_chart
          [52, total_weeks_of_data].min
        end
    
        def total_weeks_of_data
          weeks_since(dates.last)
        end
    
        def weeks_since(date)
          (today - date).to_i / 7
        end
    
        def cumulative_percentages
          running_total = 0
          percentages_per_week.map {|count| running_total += count}
        end
    
        def percentages_per_week
          datapoints_per_week_in_chart.map(&percentage)
        end
    
        def percentage
          Proc.new {|count| (count * 100.0 / dates.count)}
        end
    
        def datapoints_per_week_in_chart
          frequencies = Array.new(total_weeks_in_chart) {|i|
            datapoints_per_week[i].to_i
          }
          frequencies << datapoints_per_week.inject(:+) - frequencies.inject(:+)
        end
    
        def datapoints_per_week
          frequencies = Array.new(total_weeks_of_data + 1, 0)
          dates.each {|date|
            frequencies[weeks_since(date)] += 1
          }
          frequencies
        end
      end
    end
    CSS Master, 3rd Edition