Types Are Mightier than Tests

When to Check Types
Can Types Specify “Behavior”?
Does Type Checking Reduce the Need for Tests?
Testing vs Type Checking
Reducing the Number of Possible Implementations
Abstracting Control Structures in Types
Summary
Comments

TDD replaces a type checker … in the same way that a strong drink replaces sorrows. ― Brent Yorgey

If you have a formal proof of a property of code you’re writing, and it can’t be encoded in the types, where is the proper place for it? ― Daniel Spiewak

The dependency of one class to another one should depend on the smallest possible interface ― Robert Martin

Programmers have always had long discussions about how a strong type system could help in writing good programs. They often lead to a comparison between dynamically and statically typed languages. Some even believe that dynamically typed languages have no type system.

Actually, we should not talk about statically and dynamically typed languages, but about compile time and run time type checking. As a matter of fact, programmers preferring run time type checking use to call their language “dynamically typed” because “dynamic” is considered a quality, while “static” is generally not. Adepts of compile time type checking prefer to talk about weakly typed vs strongly typed languages, since obviously it’s better to be strong than weak.

In the end, one question often remains: Where do tests fit into this?

When to Check Types

Checking types only at run time gives you more freedom. However, it is mostly the freedom of messing with types. If you make a type error, the compiler will not tell you, and the error will only appear at run time … or not. There are two principal reasons for errors not to show up:

Languages with run time type checking generally offer some implicit type conversion functionality. If you did not select the right type, the language will try to convert your type to one that fits. You have however no reason to believe that it will be correct, unless you did it on purpose.
If it can’t convert your type to any suitable one, it will throw an error. The consequences of this error might not be well known. If you are lucky, it will crash the program with a message telling you what the error was. If you are not, it might only crash a thread and let the application running in an undetermined state. Furthermore, if no error occurs, you might think that the program is correct. There could be an error in a not yet executed part of the program. And this error could come to life long after the program is in production.

For these reasons, a statically checked type system will be of better help to write correct programs. But another question arises: How strong should the type system be? What level of constraint should it impose on your programming?

Seeing the type checker as a constraint is a mistake. A strong type checker is mostly a help.

Java’s type system is weak, though. What does that mean? It means that few parts of the program “behavior” are abstracted in types. But you are free to create your own types, or to use libraries providing new types.

Specifying behavior with tests.

Can Types Specify “Behavior”?

This is an interesting question, but it raises new questions: What is behavior? And are programs a mean to specify behavior for a computer?

Answering yes to the last question does not give any information about what programs are in general, but only about the type of programming you are doing. Writing programs that specify behavior is called “imperative programing”. Writing programs that don’t is something different.

Functional programming is “non imperative”, meaning that you don’t write a program to specify how the computer should behave. You write an algebraic expression that, when evaluated, gives the expected result. But you do not describe how the program will behave to do this. This should not be surprising even for imperative programmers, because they use this every day. Consider the following example:

void displayMessage(String message) {}
    System.out.println(message);
}

When you write this, are you specifying behavior? You are certainly not specifying how the computer should behave to get the message printed to the console. You are just asking for an effect to be applied to a value. You are not even choosing this value. Specifying behavior is describing the way by which you will get the result, and this is done by composing instructions using control structures. But functional programming does not use control structures, nor instructions.

So to the question “Can types specify behavior”, for a functional programmer, the answer is no, since there is no behavior to specify. On the other hand, for an imperative programmer using an object oriented language, the answer is obviously yes, since this is what objects are (among other things): behavior encapsulated in types. Objects are not just types. But they are types. In Java, String, Integer, List are types. And theses types specify behavior. Not all the behavior for a program, but a part of it.

Does Type Checking Reduce the Need for Tests?

Let’s take an example: Say you want to concatenate two lists of strings. First you’d have to check the arguments for null and then somehow put the lists together:

List<String> concat(List<String> list1, List<String> list2) {
    if (list1 == null) {
        return list2;
    } else if (list2 == null) {
        return list1;
    } else {
        for (int i = 0; i < list2.size(); i++) {
            list1.add(list2.get(i));
        }
        return list1;
    }
}

This is what one might have done with old languages. Modern languages brought some new abstractions allowing us to simplify this kind of program. Using a “for each” loop, we could write:

List<String> concat(List<String> list1, List<String> list2) {
    if (list1 == null) {
        return list2;
    } else if (list2 == null) {
        return list1;
    } else {
        for (String s : list2) {
          list1.add(s);
        }
        return list1;
    }
}

Java even offers us a better abstraction:

List<String> concat(List<String> list1, List<String> list2) {
    if (list1 == null) {
        return list2;
    } else if (list2 == null) {
        return list1;
    } else {
        list1.addAll(list2);
        return list1;
    }
}

Or, using Java 8:

List<String> concat(List<String> list1, List<String> list2) {
    if (list1 == null) {
        return list2;
    } else if (list2 == null) {
        return list1;
    } else {
        list2.forEach(list1::add);
        return list1;
    }
}

All these abstractions are good, (especially addAll) but they do not change anything to the level of testing we need. In fact, since we implement null checking, we have to test that our implementation of null checking is effective. So we should test our program with each argument being null (and also with both of them being null, of course).

Can we use a type system that makes the checks for null unnecessary? Yes we can. We could use non-nullable types, like Kotlin offers. In Kotlin, the type List<String> can’t apply to null. List<String> is a subtype of List<String>?, which is nullable. By offering an non-nullable version, it frees us from checking for nulls, and hence it frees us for testing those checks.

Such a type with a constraint (not being null) is the simplest form of dependent types. Dependent types are types that depend on values. For example, with dependent types, a List of 5 elements of type T could be of a different type than a list of 6 elemens of type T. Wikipedia has a good article on dependent types.

In other languages, a non-nullable List might be represented by a parameterized type, such as Optional<List<String>>. The most visible difference between the two approaches is that Optional<List<String>> is not a parent type of List<String>. But they have several similarities, among which the fact that unlike null, they may represent absence of data without losing the type. And most importantly, they compose.

Think again about the contract: shouldn’t the fact that your method receives an argument of type List<String> mean that you could safely call size() on it (or any other method of List)? If you can’t trust the type, you can’t do anything good without a huge number of tests. Obviously, using a type system that handles this problem makes a great number of tests unnecessary.

Testing vs Type Checking

But our program has another problem. You will probably already have noticed that the program was flawed. In such case, you might think that you don’t need the type checker to tell you. But why then, would you want the tests to tell you that your program is incorrect?

Anyway, let’s do it. We might want to test the program with several unit tests, such as:

@Test
public void testAll() {
    List<String> list1 = new ArrayList<>(Arrays.asList("1", "2", "3"));
    List<String> list2 = new ArrayList<>(Arrays.asList("4", "5", "6"));
    List<String> list3 = new ArrayList<>(Arrays.asList("7", "8", "9"));
    List<String> list4 = new ArrayList<>(Arrays.asList("1", "2", "3", "4", "5", "6"));
    assertEquals(list4, concat(list1, list2));
    assertEquals(list1, concat(list1, null));
    assertEquals(list2, concat(null, list2));
    assertNull(concat(null, null));
    assertEquals(
            concat(list1, concat(list2, list3)),
            concat(concat(list1, list2), list3));
}

Of course, tests should be made individual, so that a failing test does not stop execution of the following ones, and also to be sure that they do not interfere, but this is just to save room. It works the same, and all these tests pass. Good? Not so!

What’s the need of testing if tests can’t find the error? Of course, I know what I have done wrong in the tested program, since I did it on purpose. But what have I done wrong in the tests?

This shows some very important facts about testing:

Testing does not prove a program to be correct. It does not even prove that it is sometimes correct (for the arguments used in tests). It only proves that we were not smart enough to prove it incorrect. What we would need here is a measure of tests correctness. Maybe tests for testing tests?
For testing to be efficient, we should put more intelligence in writing the tests than in writing the program. Otherwise, the program will most often beat the tests. But generally, we put as much intelligence as possible in the program. So not much is left for the tests.
Believing that more tests will make the program better is like believing that weighing yourself more often will make you lose weight. (Although, if the scale is on the tenth floor and you live on the first, it might eventually be not so wrong. But this would only be a side effect, so it would not work for functional programmers.)

In case you have not noticed why the program is wrong, run this:

@Test
public void testAll() {
    List<String> list1 = new ArrayList<>(Arrays.asList("1", "2", "3"));
    List<String> list2 = new ArrayList<>(Arrays.asList("4", "5", "6"));
    List<String> list3 = new ArrayList<>(Arrays.asList("7", "8", "9"));
    assertEquals(concat(list1, concat(list2, list3)),
                 concat(concat(list1, list2), list3));
    System.out.println(concat(list1, concat(list2, list3)));
    System.out.println(concat(concat(list1, list2), list3));
}

The tests pass, but the printed output is:

[1, 2, 3, 4, 5, 6, 7, 8, 9, 4, 5, 6, 7, 8, 9, 7, 8, 9, 4, 5, 6, 7, 8, 9, 7, 8, 9]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 4, 5, 6, 7, 8, 9, 7, 8, 9, 4, 5, 6, 7, 8, 9, 7, 8, 9, 4, 5, 6, 7, 8, 9, 7, 8, 9, 7, 8, 9]

Of course, now, it is obvious. But wouldn’t it be better to rely on the type system to prevent this? Instead of checking that the arguments were not mutated, we could have either:

made a defensive copy
used immutable types
restricted the types

Can the compiler check that we have made a defensive copy? Probably not. And even with tests, this can only be partially checked.

Java offers immutable lists, but they are cumbersome to use, and not very efficient. Plus they are not really immutable, but often rather immutable views of mutable collections. And anyway, the biggest problem is that Java does not differentiate between mutable and immutable lists at the type level, but only at the behavior level.

A Java immutable list is of the same type as a mutable one. There is nothing we can do about this, so we have to create our own types, which means expressing this behavior in the type system. While this is not difficult (and in a real project not even necessary as we could use existing implementations, PCollections or Javaslang for example), there are other means to achieve our goal.

List behavior is of course not restricted to mutability or immutability: Many operations on lists are implemented in terms of control structures, such as for or while loops. Some of these operations have been translated into types, such as Iterable, or Enumeration What we can do is relying on such types instead of the original List type:

static List<String> concat(
        Iterable<String> list1, List<String> list2) {
    if (list1 == null) {
        return list2;
    } else if (list2 == null) {
        // compile error #1 - incompatible types:
        // Iterable<String> cannot be converted to List<String>
        return list1;
    } else {
        // compile error #2 - invalid method reference
        // cannot find method add() on Iterable<String>
        list2.forEach(list1::add);
        // compile error #3 - incompatible types:
        // Iterable<String> cannot be converted to List<String>
        return list1;
    }
}

Suddenly, our program no longer compiles! And this is a good thing because we have three errors, not only compile errors but conceptual errors! An Iterable, for example, is read-only but we try to add elements.

What we observe here is that by using a type exposing only the minimal required interface, the compiler may help us detect the problem. So we change our concat method to:

List<String> concat(Iterable<String> list1, List<String> list2) {
    if (list1 == null) {
        return list2;
    } else {
        ArrayList<String> result = new ArrayList<>();
        list1.forEach(result::add);
        if (list2 == null) {
          return result;
        } else {
          result.addAll(list2);
        }       
        return result;
    }
}

And now, the code is (somewhat more) correct. This does not mean that this is the ideal solution. It only means that the test did not help us to detect the problem. The type system did.

Reducing the Number of Possible Implementations

We can go much farther than this, though, with a better type system. It is common practice to measure the quality of the tests by measuring the branch coverage. But branch coverage does not mean much. Generally, programmers try to create as few tests as they can while getting the highest branch coverage.

Beside this, tests of this kind are implementation dependent, meaning they have to be updated when the implementation changes. This is cumbersome and is often not done.

Some programmers think that tests should not depend upon the implementation. They even think that the tests should be written prior to implementing what is to be tested. I could not agree more, if not for a slight condition: To be effective, these tests should test all possible implementations, since we do not know in advance what the implementation will be.

One could even imagine a way to check the strength of the test suite: A programmer should not be able to write an incorrect implementation without the test detecting it. Writing tests with this in mind seems like a huge task! (If you’re doing mutation testing, you at least know how to find incorrect implementations that pass the tests.)

Although not always. Once again, we can make the type system help us. Ideally, we would want to restrict the types so that there is only one possible implementation. This way, we would have only one test to write. Is it possible? Well, sometimes it is. Take the following example:

static <A, B, C> Function<A, C> transmogrify(B b, BiFunction<A, B, C> f) {
    // ...
}

Can you write a test for this method? You might think you can’t, since you don’t know what the method does. The method name will probably not help you much. So you might want to look at the implementation, but unfortunately, it is not available. Maybe it has not been implemented yet and you are doing TDD. What can you do?

It’s in fact quite easy:

@Test
public void testTransmogrify() {
    Double pound = 0.794546235;
    BiFunction<Integer, Double, Double> multiply = (a, b) -> a * b;
    Function<Integer, Double> toPound = transmogrify(pound, multiply);
    assertEquals(toPound.apply(45), 35.754 , 0.001);
}

This is not magic. In fact, there is only one possible implementation. Don’t believe me? Read On!

This is actually quite common in functional programming. Often, functional programmers have a method to implement, given its signature. And often, they just have to follow the types.

Let’s see how this happens. To start with, we must return a function, so we can write it as a lambda, with a parameter A a (but we do not need to indicate the type):

static <A, B, C> Function<A, C> transmogrify(B b, BiFunction<A, B, C> f) {
    return a ->
}

Next, we have to create a C to be the return value of the function. All we have at our disposal, to do this, is a B and a BiFunction that will create a C from an A and a B. Couldn’t be simpler! We use the A a we started the lambda expression with and the B b given to the method and apply the BiFunction:

static <A, B, C> Function<A, C> transmogrify(B b, BiFunction<A, B, C> f) {
    return a -> f.apply(a, b);
}

That’s it! Now you can give a more meaningful name to the function. What it does is partially applying the second argument of the BiFunction, producing a Function of a single argument. (More exactly, it applies the second element of the pair which is the unique argument of the BiFunction.) So we could name it partial2.

Such a function is useful when you want to repetitively apply a BiFunction with the same “second argument”. In our test, we transform a function that converts a currency into another given an amount and a rate, into a function converting an amount using a specific rate. In other words, we created a dollars to pounds converter from a general currency converter (since the conversion rate for dollars to pounds is “built in” the partially applied function).

Do we really need the test? With a true functional language, we would not. In Java, we can represent null values and exceptions with types, and that is exactly what we do if we want to program functionally in Java.

Alas, we can’t always avoid dealing with null and exceptions. So if we can’t see the implementation, we have to write one test, just to guarantee that the method does not return null nor throw an exception. We don’t have to test more than this – if it works for one argument value, it works for all. (Unfortunately, this is not absolutely true since the implementation could check some external data and throw exceptions or return null at random, so it is true only if the method is referentially transparent, which should always be the case in functional programming).

In any case, if we wrote the implementation, we can see that it does not throw an exception nor return null. The test is useful however to harden the code against future changes. If someone were to change the implementation towards throwing an excepting or returning null it will likely break existing code and the test is there to indicate that as early as possible. But note that one single test is enough to eliminate both conditions (returning null and throwing exception).

Specifying behavior with types — Published by Ken Hawkins under CC-BY 2.0 / SitePoint changed the background

Abstracting Control Structures in Types

The same principle applies to more complex problems. Not only can the type system be used to express exceptions or nulls, it can also express any other condition, such as cardinality, “futureness”, optionality, laziness, effects, state, parallelism, and more. The result is that we do not need any control structure.

This might seem unbelievable for imperative programmers, but true functional languages do not have control structures. No if ... else, no for or while loops, no switch ... case, no try ... catch, nothing. How do you manage the control flow then? (Brace yourself before reading on.) The answer is: You don’t! There is no control flow.

Let’s rewrite our concat example without any control flow. Of course, Java does not offer the types we need, so we are going to create them.

First, lets look at the algorithm. We have two lists, and we want to concatenate them by taking each element of the second and appending it to the first one. Remember the third quote at the beginning of the article?

The dependency of one class to another one should depend on the smallest possible interface ― Robert Martin

The type system will be of more use if we use the minimal required information. The first argument of the concat method does not have to be a list. Because we only want to append elements to the first argument, it needs to be Appendable. (And because we do not need any other behavior, it does no have to be of any more specific type.) Here’s the Appendable type:

public interface Appendable<T, U> {

    Appendable<T, U> append(T t);

    U get();
}

T is the type of the elements that get appended, U the collection that holds them (for example a List<T>). It has a method append that returns a new Appendable with the given element added (the type itself should be immutable), and another method get that returns the current collection of items.

Going back to the arguments, the second also doesn’t need to be a list – it needs to be iterable. In functional programming, we use another concept, though: Foldable.

To understand Foldable, think about a list of characters: we can start with an empty string, and successively append each character to the string. For this, we use a function taking a string and a character and returning a string. If we start from the left, and we use the string as the first argument, this is a left fold. If we start from the right, using the element as the first argument of the function, it is a right fold. So we can define the Foldable type:

public interface Foldable<T> extends RightFoldable<T>, LeftFoldable<T> {}

public interface LeftFoldable<T> {
    <U> U foldLeft(U seed, BiFunction<U, T, U> f);
}

public interface RightFoldable<T> {
    <U> U foldRight(U seed, BiFunction<T, U, U> f);
}

Of course, we need a List type implementing these interfaces. A functional one would be too long for this article but I’ve created a wrapper around a Java List that you can give a try. (This, of course, is not production code.) The implementation itself is not relevant, though – in a real-life project it would be provided either by the language, or by a library.

What is important is the code we have to write. Here is what our concat method should look like:

public static <T> List<T> concat(Appendable<T, List<T>> a, LeftFoldable<T> b) {
    return b.foldLeft(a, Appendable::append).get();
}

How would we test this method? Which “behavior” is test-worthy? The method can return null or throw an exception. It can also append any number of copies of the elements of its second argument to its first argument.

This seems to allow for an infinite number of implementations. In fact, it does, but all the wrong implementations can be eliminated by a single test. This can be a test of the result of concatenating two arbitrary lists (although the second one must not be empty), but there is no need to test for the expected result. Testing that the length of the result is equals to the sum of the lengths of the arguments is enough.

All other properties of our solution are expressed as types!

Summary

In this article, we have seen why testing is a very weak (but necessary) tool to check imperative program correctness. Then we have shown how functional programming with a strong type system considerably lowers the need for testing. In a perfect functional programming world, tests would not be necessary at all. But nothing is perfect, so we need some tests. Much, much less, though, than what imperative programmers are used to.

By the way, functional programming, for the same reasons, also lowers the need for logging and for debugging.