Preventing Code Rot 101: Unit Testing

A wise coder once said: “When I commit my code, only God and I know what it does. After a while, only God knows.” It’s the basic definition of code rot and applies to pretty much all code that we write.

Sure, the code will still do exactly what it did when you checked it in or when you pushed it to production, but since it works, it becomes something magical: a proven truth, and you don’t touch proven truths. Over time, the code becomes a behemoth that nobody dares to touch, the legacy part of the codebase. It was designed “a long time ago,” it has functioned exactly the same way since its release, and nobody fully comprehends it. If something needs to be changed, you hack around it; you alter the input or the output, but leave the black box intact.

Unless you’re working on a fresh project right now, the codebase you work with most likely has such code you take for granted. And the more challenging your current assignment (or fresh project) is, the more likely it will become a legacy part as well on short notice.

Since it’s infeasible to fully comprehend all problems that your codebases solve, as well as all their solutions, something else needs to be done to prevent code rot. One way or the other: the code needs to be refactorable to live happily ever after. And that’s where unit tests come in; they provide a simple way to write proofs that your code works as intended. And each proof makes that what is proven refactorable. You can change the code that is being proven, and then have the tests prove if the new code still functions the same way.

While I would love to tell you every little detail I know about unit testing and how to apply it to your code, I reluctantly agreed with my editor that 150,000 words is too long for one article. So I’m sticking with the basics.

Unit Testing Basics

In principle, a unit test is a small piece of code that does 3 things:

  • Assemble – sets up an environment similar to what the code would run in during production, but without being dependent on any other components in your code
  • Act – executes the piece of code to be proven
  • Assert – verifies that the outcome of the execution is what was expected given the circumstances

The astute reader will notice that this doesn’t actually test the code for correctness. If there is a bug caused by assumptions in the code, introduced by the same developer writing the unit test, then the bug won’t be found since the test will be written with the same assumptions. Bugs caused by typos will be found by running the tests, similar to as they would be found if the code was executed in a staging environment. So unit testing is a bit of a misleading name: it’s more unit-proving than unit-testing.

The unit to be proven is up to the developer writing the unit test, although the rule of thumb in this is that the unit has to act, that is, it has to have behavior. Simple entity classes which only store values do not need unit tests since there is nothing to prove. You can rely on the programming language to be able to store values in a variable; there’s no point in testing that.

On a related note, feel free to ignore code coverage numbers. They’re useless metrics. The only thing that matters is whether the acting components, or units, have proof. You can have a whopping 95% code coverage, but if that 5% is all core business logic then you still have the (about-to-be) black boxes waiting to start rotting.

Assemble

The important part of the assemble stage is that the environment mimics production but does not use any components/units not under the test. If you need to prove a method that modifies a value that comes from the database, simplify that to just modifying a value. You can mock the database component that would provide the value in production if your code is tightly coupled.

With a mock component, you design a fake component which, to the component under test, is presented as the real component but will behave in a predictable way. Say the component you are testing gets a list of names from the database through a data layer component and then capitalizes every first character. The mock object mimics the data layer component but does not actually talk to a database. Instead, it contains a static list of first names.

<?php
...
    // ** ORIGINAL METHOD **/
    public function getCustomerFirstNames(DBLayer $db) {
        $dbvalues = $db->getQueryResults("SELECT firstname FROM customers");
        $firstnames = array();
        foreach($dbvalues as $firstname) {
            $firstnames[] = ucfirst($firstname);
        }
        return $firstnames;
    }
...
<?php
//** Mock for DB layer **/
class DBLayer
{
    public function getQueryResults($sql) {
        return array("John", "tim", "bob", "Martin");
    }
}
<?php
...
    //**PHPUnit testcase **/
    public function setUp() {
        //mock file included instead of real file
        $this->db = new DBLayer();
    }
 
    public function testGetcustomerfirstnamesWillReturnFirstnamesModifiedByUcfirst() {
        // assemble (together with the runonce setUp() method)
        $obj = new OriginalClass();
        $expected = array("John", "Tim", "Bob", "Martin");
        // act
        $results = $obj->getCustomerFirstNames($this->db);
        // assert
        $this->assertEquals($expected, $results, "GetCustomerFirstNames did not ucfirst() the list of names correctly");
    }
...

With this approach, two things have been achieved:

  • The test component does not rely on any other component
  • The assembled environment in which we run the test is predictable; it will behave exactly the same every time we run the test

Act and Assert

With a suitable environment in place, the test component can be executed. Given that the input is known, as well as what problem the code is supposed to solve and how it is solved, the outcome can easily be derived. And that is exactly what the assertion stage is for. This is the part that actually proves the code.

The assertion is simply stating: “given known input x, processed by known function f, the output is f(x)”. The developer who wrote function f and the accompanying unit test wrote this function to solve the particular problem at hand. The test removes the need to know or even comprehend the problem, or why it was solved in a particular manner. The code’s functionality was accepted (and hopefully is continuously being accepted by acceptance tests), so as long as the unit test passes; the code is proven correct.

This means that the code can now be refactored as much as needed with the unit test serving as proof that functionality remained unchanged. This counteracts the biggest cause of code rot as the code is now safe to be modified. It can’t be broken as long as it’s being covered by unit tests which pass. If a test fails, the developer is quickly notified and can adjust or revert the applied change, even before the code is executed in a staging environment.

Applied Unit Testing

Now that you have an understanding of what unit tests are and how they work, I’d like to touch on a few points that will make your unit testing experience a whole lot better.

Unit tests are not integration tests, acceptance tests, or any other form of implementation tests.
While every unit testing framework will give you a full toolkit making it very tempting to sneak in some integration or acceptance tests… don’t. Unit tests should only prove that the solution of the original developer is still being applied despite any changes to the code under test. While you can write integration or acceptance tests with the same framework, you may want to consider using two separate frameworks so it’s easier to keep the two separate.

Don’t forget to write integration/acceptance tests as well.
A unit test proves a tiny piece of code in isolation, but it does not prove that the system works. This is covered by integration and acceptance tests; a test suite which is slow running, combining all components, and proves that the pieces work together as they should. A unit test is concise and fast, allowing it to be ran before and after every change without costing you much time. An integration/acceptance test is something you run at the end of a development cycle, during the QA phase.

Keep your unit tests simple and concise.
Keep in mind that unit tests will be marked as failed when at least one assertion fails, so a general rule of thumb worth following is the fewer assertions in a test, the better. A good test never mixes behavior proofs. If such a test fails, you immediately know exactly what functionality is broken. If behavior proofs get mixed or multiple assertions are present in the same test, you’ll need to start up a debugger just to figure out what behavior is now broken. All assertions could be failing, or just one, only the first failed assertion will be reported.

Name your unit tests verbosely.
The test method name will never be called manually, so there’s no need to use clever or concise naming conventions. Verbosity in the name serves as an extra form of documentation for what is being tested.

An added bonus is that when output in TestDox format is chosen, a unit testing framework like PHPUnit produces a nice checklist with human readable lines displayed that serves as documentation. If you write your tests before the actual implementation, you can even use this output as a quick checklist to see what’s finished and what still needs to be written.

Only write unit tests for code you own.
While unit testing frameworks give you so many tools that it becomes tempting to write tests for everything in sight, only code that you’ve written personally can be properly unit tested. Good code is a solution for a problem; it’s the result of thoroughly understanding the problem, an understanding of the possible solutions, and finally the actual implementation of a solution. If only the implementation is visible, the test would be written without the same assumptions and understanding of the original problem solver. The unit test is then very likely to miss specific design choices, making the test unreliable.

Write unit tests for bug fixes.
Before fixing the bug, write a test that proves the current code is wrong (because the assertion fails). Then fix the code, causing the unchanged assertion to pass. As a side effect, it’s great way to start adding unit tests to a currently uncovered piece of code. This helps cures code rot, one little unit test at a time, as that part of the code is now proven to be correct (for the current definition of correct). This also prevents the bug from being re-introduced when refactoring, since allowing the buggy behavior again would cause the unit test to fail.

Never change the code under test.
When you start writing tests for code already written, you will quickly notice that most code is hard to test because of tight coupling or other design choices. It will be very tempting to adjust the code under test a little to make it way easier to write a unit test for it, but don’t do this. The whole purpose of unit testing is to prove the existing code so it is safe to refactor it. Refactoring before writing the unit test is exactly the risk you want to avoid – you won’t know what you will break or which bugs you will introduce.

Rounding Up

Managers aren’t going to assign a full team to each problem and have them maintain that tiny codebase for all of eternity, so if you want to prevent your genius solutions from rotting away, you are writing your own little virtual helpers in the form of unit tests to maintain that codebase for you.

If you’re proud of the code you write, you should have unit tests for it. If you don’t know how to write unit tests, then you should be learning how to write them. If your team doesn’t use unit tests, convince them that it’s a required tool.

There is so much more to be learned about unit tests and why you should write them; for further reading visit:

Feel free to ask around your local PHP User Group, or attend sessions about the subject at your next conference, as well. Most professional PHP developers have at least some experience with PHPUnit and unit testing in general.

A wiser coder would say: “When I commit my code, only God, the unit tests and I know what it does. After a while, only God and the unit tests know.”

Image via Fotolia

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • http://wojciechfornal.com Wojciech Fornal

    I’d say that in many cases even God doesn’t know why some code works as it works. By the way, does God like spaghetti? ;)

  • Chris Emerson

    This is great – you always see articles about how to implement unit testing, or the syntax, or usage of the libraries, but nothing about what a good test is, how you should use it in practice, what kind of things you should actually test for etc. More of the above would be great!

  • http://www.adeveloper.org Hossein Baghayi

    Why are you creating a new class for mocking a functionality? (Database class for instance),
    Wouldn’t it be better to mock it using phpunit’s built-in functionalities? using getMock or something?

    • http://www.wolerized.com Remi Woler

      Because the code written serves a purpose of demonstration. Using a framework for the mock would hide the point in layers. This way, no matter which framework you are familiar with (including none at all), you still get what the code is doing without having to google it.