PHP
Article
By Tobias Schlitt

Cleaning up Code: Is Refactoring for Aesthetics worth It?

By Tobias Schlitt

Most development teams want to get their codebase into a better, more maintainable state. But what definition of better should be chosen? In many cases, it is not necessary to dig deep into Domain Driven Design (DDD) to achieve this goal. Sometimes, it’s even counter productive. But one of the most basic collections of principles can help each team a lot already: Clean Code.

Vector image of sweeping broom

The Goal

The Clean Code book by Robert C. Martin summarizes many simple and advanced improvements to get better, understandable, and therefore more maintainable code. Let’s take the following code snippet as an example:

public function getJobs($date)
{
    $ret = array();
    $res = $this->db->query(
        'SELECT * FROM jobs WHERE year=' . date('Y', $date) . ' AND month='
            . date('m', $date)
    );
    $res = $res->fetchAll();
    foreach ($res as $data) {
        $j = new Job(/* $data['...'] */);
        $j->rev = $j->time * $j->euro;
        $ret[] = $j;
    }
    return $ret;
}

The purpose of this already simple method seems to be to return jobs for a certain date. While this intention is clear, reading through the method and understanding how it performs its job is not that easy. Once done with it the algorithm appears pretty easy: first it fetches some data from a database, then it prepares job objects, calculates some additional data and returns the objects.

But the time spent to understand this simple code snippet is wasted. As you, a developer, analyze your and other people’s everyday work, you will notice that you spend much more time reading and understanding code than actually writing it.

A low hanging fruit would be to just rename the local variables to improve readability:

public function getJobs($date)
{
    $jobs = array();
    $statement = $this->db->query(
        'SELECT * FROM jobs WHERE year=' . date('Y', $date) . ' AND month='
            . date('m', $date)
    );
    $rows = $statement->fetchAll();
    foreach ($rows as $row) {
        $job = new Job(/* $row['...'] */);
        $job->rev = $job->time * $job->euro;
        $jobs[] = $job;
    }
    return $jobs;
}

The variable previously known as $ret was renamed to $jobs. This name now clearly indicates what we can expect from it: a list ob jobs. Furthermore, this cannot be easily confused with $ret which was renamed into $statement and $rows. We now understand what’s going on much better.

Another step towards improving readability would be to break the code up into logical pieces that belong together and make it read more like a book, or newspaper, where paragraphs are information groups:

public function getJobs($date)
{
    $statement = $this->db->query(
        'SELECT * FROM jobs WHERE year=' . date('Y', $date) . ' AND month='
            . date('m', $date)
    );
    $rows = $res->fetchAll();

    $jobs = array();
    foreach ($rows as $row) {
        $job = new Job(/* $row['...'] */);
        $job->rev = $job->time * $job->euro;
        $jobs[] = $job;
    }
    return $jobs;
}

There are two paragraphs now. In the first one, there is a number of rows fetched from the database. The second paragraph converts these rows into objects. This minor relocation of the declaration of $jobs and the delimiting blank line did not only ease the reading flow of the method, but also prepared a possible subsequent refactoring step of splitting the tasks into private methods with the intention of revealing names.

This is just the tip of the iceberg that Clean Code offers.

--ADVERTISEMENT--

The Methodology

As can be seen above, the refactoring steps are really minimalistic – baby steps. This is not just for the purpose of presentation but should be the way whenever you refactor. The recommended approach is to commit after each and every step:

  1. Choose 1 single refactoring step
  2. Execute the selected step
  3. Verify the code still works
  4. Commit current state and goto 1

Most importantly, this procedure gives you a safety line so you can easily jump one step back. You should do this whenever you feel like you might run in the wrong direction. If you need to start debugging – don’t. It is way too much effort to try to fix a refactoring instead of reverting it. Chances are good that you’ll lose your way in debugging and it will eat up more time than starting over would. If you go with baby steps, a reversion of the latest step would take 5 minutes instead.

Furthermore, this approach helps you stay focused and open minded about what the eventual goal of your refactoring is. Code is complex and our brain often has a biased idea of what the optimal outcome of this complex structure is. By keeping your mind open to adjusting the goal during the process, you allow it to find better solutions along the way.

The process also allows you to stop refactoring at any point in time without losing progress. An urgent bug to fix? A colleague asks for sparring? Lunch? No problem, you can either just merge the working state you have or leave it in a branch to pick it up later again.

Still, there is one draw-back with frequent commits during refactoring: so many small commits might pollute your version control history. If you are using Git, you can mitigate this by squashing the commits of a refactoring into a single one before pushing upstream.

The Preconditions

Taking Clean Code as a goal and applying baby steps are good building blocks for sustainable refactoring, but there is more to take into account.

First of all, it is not a good idea to perform a refactoring just because you can or because you think the current code looks ugly. Changing the code is always a risk and you should take no risk without good reason. One good reason would e.g. be that you often need to work on a single piece of code which is hard to grasp and where changes cascade throughout your system. But there are many other reasons like frequent bugs, unforeseen business requirements, etc. It is important that you analyze the reasons carefully, and not just blindly run into refactoring.

Secondly, it’s important to have at least some degree of automated tests that cover the most important business cases where the code being refactored is used. Legacy systems are typically hard to test with unit or integration test methods. Therefore, it’s recommended not to try to apply these methods through a brute force approach. Instead, we found it quite useful to apply tests through the front-end – in our case using Mink and PHPUnit.

But, as with so many decisions in software development, these are only rules of thumb and every project has its own requirements and priorities. Join me in my Clean Code workshop at the WebSummerCamp 2016 to learn more. Bring a companion, too – there’s a dedicated program for significant others, too, so bring company and make it a work vacation!

  • Thanks for sharing the code!

  • Thank you for sharing. This is very helpful and informative.

  • Henrik Blunck

    I beg to differ. Programmers should have learned how important it is to supply documentation with source code. That important step is possible by adding comments into coding – and should never be disregarded just because you are coding online.
    If anyone has supplied trash to begin with, then you are quite right that it’s going to take time changing it, but the value of being thorough IS that it will improve development in the future.

    Let’s take the example of inline CSS vs. external CSS. Lazy people have used software that created inline CSS. It’s a really shitty job to change a domain deisgn when you have to change each file, whereas having ONE external CSS-file allows you to change the design in ONE simple step.

    But a good article, and some excellent points.

  • Test, Test, Test… There’s no point in refactoring broken code so it looks nicer.

    With or without the refactoring for style and readability I still don’t quite see where the $j object comes from. I’m pretty sure the code snippet example method does not work as expected since it’s using an object $j that’s not passed in to the method or created within the method, yet it is being added to the $jobs array over and over again. My guess is that the result of using this method would be 2 PHP Notices for every row that was served back from the DB and an empty array always being returned.

    I imagine this is just a an overlooked typo since it would make more sense if the $j variable was just renamed to $job but this is a good example to make a strong case for the fact that writing tests for the code you’re about to refactor before refactoring is more important than any other piece of advice that can be attached to refactoring. You’ll find out if the method even works in the first place and you will have evidence of the fact that it does work after you make changes.

    The short mention of testing at the end in my opinion needs to be moved to be the primary requirement of refactoring at the beginning. The only reason a code base is hard to test is because it’s not understood how to test it which is almost always due to there being no tests. Once you’re over the hurdle of understanding how to test it then the rest becomes quite easy. This example looks like it uses CodeIgniter which is known to be less than easy to test but if you mocked and set the DB (which isn’t difficult) then you could at least write a simple unit test to ensure that the data being passed back from this method is an array of `Job` objects.

  • Andrian Motelica

    $job->rev = $j->time * $job->euro;
    I think you should revise your $j’s

  • Leonardo DaVincci

    I’m confused. Is “refactoring for aesthetics” really “refactoring?” To me refactoring is a means to get your code to perform more efficiently, not necessarily to be read by human beings. I’m not arguing against making code more human readable. I’m just questioning the use of the word “refactoring” here.

    • For me we don’t code for machines but for humans. It’s why naming is important as well. If we can refactor to be more readable, in that case we should.

      Developing is as well working in a team (even if your are alone, you never know if you will get help one day). It means developing is as well communicating.

  • $job === $j ?

  • foreach ($res as $data) {
    $j = new Job(/* $data[‘…’] */);
    $j->rev = $j->time * $job->euro;
    $ret[] = $j;
    }

    Where is $job defined?

    At least the code in this article prove one thing: copy past too much and you will destroy everything.

  • Why do you need to allocate the $statement? Are you planning to use it later? Do you also need the variable $rows for something else than the foreach?
    Why are you using a foreach for mapping a set of elements that have a direct relationship with elements of other set? $job = f($row)? seems to be a function, not a loop.
    You can return array_map($f, $rows) instead of repeating ntime the foreach $results[] = $result pattern in your project.
    Remove unneeded code, to make it easier to read ;) not just rename the variables.

Recommended
Sponsors
Get the latest in PHP, once a week, for free.