Property Based Testing with Vavr

    Gregor Trefs
    Share

    Usual unit testing is based on examples: We define the input for our unit and check for an expected output. Thus, we can ensure that for a specific sample our code works right. However, this does not prove the correctness for all possible inputs. Property based testing, also called property checking, takes another approach by adhering to the scientific theory of Falsifiability: As long as there is no evidence that a proposed property of our code is false, it is assumed to be true. In this article we are using Vavr (formerly called Javaslang) for property based testing to verify a simple FizzBuzz implementation.

    Before We Start

    I am using Vavr Streams to generate an infinite FizzBuzz sequence which should satisfy certain properties. Compared with Java 8 Streams, the main benefit in the scope of this article is that the Vavr variant provides a good API to retrieve individual elements with the get function.

    Property Based Testing

    Property based testing (PBT) allows developers to specify the high-level behaviour of a program in terms of invariants it should fullfill. A PBT framework creates test cases to ensure that the program runs as specified.

    A property is the combination of an invariant with an input values generator. For each generated value, the invariant is treated as a predicate and checked whether it yields true or false for that value.

    As soon as there is one value which yields false, the property is said to be falsified and checking is aborted. If a property cannot be falsified after a specific amount of sample data, the property is assumed to be satisfied.

    Reverse List Example

    Take a function that reverses a list as an example. It has several implicit invariants that must hold in order to provide proper functionality. One of them is, that the first element of a list is the last element of the same list reversed. A corresponding property combines this invariant with a generator which generates lists with random values and lengths.

    Asssume, that the generator generates the following input values: [1,2,3], ["a","b"] and [4,5,6]. The list reverse function returns [3,2,1] for [1,2,3]. The first element of the list is the last elment of the same list reversed. The check yields true. The list reverse funcion returns ["a", "b"] for ["a", "b"]. The first element of the list is not the last element of the same list reversed. The check yields false. The property is falsified.

    In Vavr the default amount of tries to falisfy a property is 1000.

    Property Checking FizzBuzz

    FizzBuzz is a child’s play counting from 1 on. Each child in turn has to say the current number out loud. If the current number is divisble by 3 the child should say “Fizz”, if it is divisble by 5 the answer should be “Buzz” and if it is divisble by both 3 and 5 the answer should be “FizzBuzz”. So the game starts like this:

    1, 2, Fizz, 4, Buzz, Fizz, 7, 8, Fizz, Buzz, 11, Fizz, 13, 14, FizzBuzz, …

    FizzBuzz is often taken as a sample exercise in job interviews where applicants should fizzbuzz the numbers from 1 to 100. FizzBuzz can also be used as a Kata simple enough to focus on the surroundings like language, libraries or tooling instead of the business logic.

    A stream seems to be a good way to model FizzBuzz. We will start with an empty one and gradually improve it to pass our tests.

    public Stream<String> fizzBuzz() {
        Stream.empty();
    }
    

    Every Third Element Should Start with Fizz

    From the description of FizzBuzz we can derive a first property about the FizzBuzz Stream: Every third element should start with Fizz. Vavr’s property checking implementation provides fluent APIs to create generators and properties.

    @Test
    public void every_third_element_starts_with_Fizz) {
        Arbitrary<Integer> multiplesOf3 = Arbitrary.integer()
            .filter(i -> i > 0)
            .filter(i -> i % 3 == 0);
    
        CheckedFunction1<Integer, Boolean> mustStartWithFizz = i ->
            fizzBuzz().get(i - 1).startsWith("Fizz");
    
        CheckResult result = Property
            .def("Every third element must start with Fizz")
            .forAll(multiplesOf3)
            .suchThat(mustStartWithFizz)
            .check();
    
        result.assertIsSatisfied();
    }
    

    Let’s go through it one by one:

    • Arbitrary.integer() generates arbitrary integers which are filtered for positive multiples of 3 using filter.
    • The function mustStartWithFizz is the invariant: For a given index, the corresponding element in the stream (get is 0-indexed) must start with Fizz. The call to suchThat further below requires a CheckedFunction1, which is like a function but may throw a checked exception (something we do not need here).
    • Property::def takes a short description and creates a Property. The call to forAll takes an Arbitrary and ensures that the invariant in suchThat must hold for all input values generated by the Arbitrary.
    • The check function uses the generator to create multiples of 3 in the range from 0 to 100. It returns a CheckResult which can be queried about the result or be asserted.

    The property is not satisfied, because for every generated multiple of 3 the corresponding element does not exist in the empty Stream. Let’s fix this by providing a suitable stream.

    public Stream<String> fizzBuzz() {
        return Stream.from(1).map(i -> i % 3 == 0 ? "Fizz": "");
    }
    

    Stream::from creates an infinite stream, which counts from the given number on. Every multiple of 3 is mapped to Fizz. The resulting stream contains a Fizz for every third element. Once we check the property again, it is satisfied and the result is printed on the console:

    Every third element must start with Fizz: OK, passed 1000 tests in 111 ms.

    Property Based Testing with Vavr

    Increase the Sample Size

    Per default, check tries to falsify a property 1000 times. Each time, the given Arbitrary generates a sample value which depends on a size parameter. Per default, the size parameter is set to 100. For Arbitrary.integer() the negative size parameter defines the lower bound and the positive size parameter defines the upper bound for the generated value. This means, a generated integer has a value between -100 and 100. We further limit the generated values to be only positive and to be multiples of 3. This means, we test each multiple of 3 in the range from 1 to 100 approximately 30 times.

    Fortunately, Vavr allows us to adapt the size and the number of tries by providing corresponding parameters to the check function.

    CheckResult result = Property
        .def("Every third element must start with Fizz")
        .forAll(multiplesOf3)
        .suchThat(mustStartWithFizz)
        .check(10_000, 1_000);
    

    Now, the generated multiples of 3 are drawn from a range between 1 and 10’000 and the property is checked 1000 times.

    Every Multiple of 5 Must End with Buzz

    A second property which can be derived from the FizzBuzz description is that every multiple of 5 must end with Buzz.

    @Test
    public void every_fifth_element_ends_with_Buzz() {
        Arbitrary<Integer> multiplesOf5 = Arbitrary.integer()
            .filter(i -> i > 0)
            .filter(i -> i % 5 == 0);
    
        CheckedFunction1<Integer, Boolean> mustEndWithBuzz = i ->
            fizzBuzz().get(i - 1).endsWith("Buzz");
    
        Property.def("Every fifth element must end with Buzz")
            .forAll(multiplesOf5)
            .suchThat(mustEndWithBuzz)
            .check(10_000, 1_000)
            .assertIsSatisfied();
    }
    

    This time, property checking and assertion are combined.

    This property is falsified just like the one checking Buzz was initially. But this time not because there are no elements (since the last test the fizzBuzz function creates an infinite stream) but because every fifth element is an empty string or sometimes Fizz but never Buzz.

    Let’s fix this by returning Buzz for every fifth element.

    public Stream<String> fizzBuzz() {
        return Stream.from(1).map(i ->
            i % 3 == 0 ? "Fizz" : (i % 5 == 0 ? "Buzz" : "")
        );
    }
    

    When we check the property again, it is falsified after some passed tests:

    Every fifth element must end with Buzz: Falsified after 7 passed tests in 56 ms.

    The message in the AssertionError describes the sample which falsified the property:

    Expected satisfied check result but was Falsified(propertyName = Every fifth element must end with Buzz, count = 8, sample = (30)).

    The reason is, that 30 is divisible by 3 and by 5 but in the fizzBuzz() implementation above divisibility by 3 has a higher precedence than divisibility by 5. However, a change in precedence does not solve the issue, because the first property that every third element should start with Fizz would be falsified. A solution is to return FizzBuzz if it is divisible by 3 and 5.

    private Stream<String> fizzBuzz() {
        return Stream.from(1).map(i -> {
            boolean divBy3 = i % 3 == 0;
            boolean divBy5 = i % 5 == 0;
    
            return divBy3 && divBy5 ? "FizzBuzz" :
                divBy3 ? "Fizz" :
                divBy5 ? "Buzz" : "";
            });
    }
    

    Both properties are satisfied with this approach.

    Every Non-Third and Non-Fifth Element Should Remain a Number

    A third property that can be derived from the FizzBuzz description is that every non-third and non-fifth element should remain a number.

    @Test
    public void every_non_fifth_and_non_third_element_is_a_number() {
        Arbitrary<Integer> nonFifthNonThird = Arbitrary.integer()
            .filter(i -> i > 0)
            .filter(i -> i % 5 != 0)
            .filter(i -> i % 3 != 0);
    
        CheckedFunction1<Integer, Boolean> mustBeANumber = i ->
            fizzBuzz().get(i - 1).equals(i.toString());
    
        Property.def("Non-fifth and non-third element must be a number")
            .forAll(nonFifthNonThird)
            .suchThat(mustBeANumber)
            .check(10_000, 1_000)
            .assertIsSatisfied();
        }
    

    The check of this property fails, because so far the fizzBuzz function maps non-Fizz and non-Buzz numbers to the empty String. Returning the number solves this issue.

    private Stream<String> fizzBuzz() {
        return Stream.from(1).map(i -> {
        boolean divBy3 = i % 3 == 0;
        boolean divBy5 = i % 5 == 0;
    
        return divBy3 && divBy5 ? "FizzBuzz" :
            divBy3 ? "Fizz" :
            divBy5 ? "Buzz" : i.toString();
        });
    }
    

    Other Properties

    Checking these three properties is enough to shape the fizzBuzz function. There is no need for a special FizzBuzz property because this case is already covered by the Fizz and Buzz properties. However, we could add another property which should not hold for all elements. For example, there should be no white space between Fizz and Buzz in FizzBuzz. I think this is a nice exercise and I leave it you, my dear reader, to define and check such a property.

    There is another flaw in our tests, though: Our properties are not specific enough. Checking that an element starts with Fizz or ends with Buzz is also satisfied by a stream which just contains FizzBuzz for any multiple of 3 and 5. Can you fix our properties?

    Benefits and Disadvantages

    As said before, PBT has the advantage that one can focus on specifying the expected behaviour of a program instead of defining each edge case. This shifts testing to a more declarative level.

    Compared with other implementations Vavr provides a nice fluent API which is unobtrusive regarding the used test framework. It works with JUnit 4, JUnit 5 and TestNG. Spock has a BDD mindest and I am not sure if the term specifcation has the same meaning in Spock as in PBT but this could be explored in another article. It seems to be a nice addition to the data-driven approach.

    A disadvantage is that rare bugs may not be discovered on the first run and may brake later builds. Further, coverage reports may differ from build to build. However, this is negligible.

    The crucial part of PBT is how well the underlying random number generator (RNG) performs and how well it is seeded. A bad RNG yields a bad distribution of test samples and this may result in “blind spots” of the tested code. Vavr relies on ThreadLocalRandom and this is more than okay.

    Vavr currently has a limited amount of supported types. It would be beneficial to also support generic subdomains like time and to provide a way to generate random entities and value objects of a core domain.

    Summary

    We have seen that Property based testing is based on the theory of Falsifiability. We learned that Vavr has a fluent API which codifies this idea. It enables us to define porperties as the combination of an invariant and an input values generator. Per default, an invariant is checked against 1000 generated samples.

    This enabled us to think about FizzBuzz on more declarative level. We defined and checked three properties and discussed the outcomes.

    Last, we talked about benefits and disadvantages of PBT.