PHP Master | Simplifying Test Data Generation with Faker

Key Takeaways

Faker is a popular open-source library that generates fake data similar to real data for testing purposes during the development process. It is highly extensible, allowing developers to define their own test data types, and comes with built-in providers for common data types.
The Faker library uses a structure of providers and formatters to generate test data. Providers are classes holding the data and the data generation formatter methods, while formatters are methods within provider classes that generate test data from a source or other formatters. Developers can create custom data sets and formatter implementations by extending the base providers.
Faker uses seeding to ensure consistency of test data across multiple test runs. By assigning a seed value for each test case, developers can replicate previous data by seeding the library’s random number generator, allowing for testing with the same data that caused an error even after the error has been fixed.

Testing is an iterative part of the development process that we carry out to ensure the quality of our code. A large portion of this entails writing test cases and testing each unit of our application using random test data.

Actual data for our application comes in when we release it to production, but during the development process we need fake data similar to real data for testing purposes. The popular open source library Faker provides us with the ability to generate different data suitable for a wide range of scenarios.

Here we’ll focus on generating random test data using Faker for testing our test cases.

How Faker Works

Faker comes with a set of built-in data providers which can be easily accessed to generate test data. Additionally, we can define our own test data types making it highly extensible. But first, let’s look at a basic example that shows how Faker works:

<?php
require "vendor/autoload.php";
$faker = FakerFactory::create();
// generate data by accessing properties
for ($i = 0; $i < 10; $i++) {
    echo "<p>" . $faker->name . "</p>";
    echo "<p>" . $faker->address . "</p>";
}

The example assumes Faker was installed using Composer and uses the Composer autoloader to make the class definitions available. You can also use Faker by cloning it from its GitHub repository and using its included autoloader if you’re not using Composer.

To use Faker, we first need to obtain an instance from FakerFactory. All of the default data providers are loaded automatically into the $faker object. Then we generate random data just by calling a formatter name. The final output of the above code will list ten random person names and addresses from the available data sources.

Providers are classes that hold the data and the necessary data generation formatter methods. Formatters are methods inside provider classes that generates test data directly from a source or using a combination of other formatters. Faker comes with the following built-in providers: Person, Address, PhoneNo, Company, Lorem, Internet, DateTime, Miscellaneous, and UserAgent.

Let’s take a look at the Person class to get a better understanding of what the structure of a Faker provider looks like.

<?php
namespace FakerProvider;
class Person extends FakerProviderBase
{
    protected static $formats = array(
           "{{firstName}} {{lastName}}",
    );
    protected static $firstName = array("John", "Jane");
    protected static $lastName = array("Doe");
    public function name() {
           $format = static::randomElement(static::$formats);
        return $this->generator->parse($format);
    }
    public static function firstName() {
        return static::randomElement(static::$firstName);
    }
}

Person acts as the provider, extending the base provider class FakerProviderBase. firstName() is a formatter which retrieves a random data element directly from the internal firstName data array. Formatters may combine other formatters and return the data in a specific format as well, which is what name() does. All of the providers and formatters work based on this structure.

The built-in providers contain basic formatters with very limited data. If you are using Faker to automate the process of generating test data, you may need to create your own data sets and formatter implementations by extending the base providers.

<?php
namespace FakerProvider;
class Student extends FakerProviderPerson
{
    protected static $formats = array(
        "{{lastName}} {{firstName}}",
        "{{firstName}} {{lastName}}"
    );
    protected static $firstName = array("Mark", "Adam");
    protected static $lastName = array("Clark", "Stewart");
    private static $prefix = array("Mr.", "Mrs.", "Ms.", "Miss", "Dr.");
    public static function prefix() {
        return static::randomElement(static::$prefix);
    }
    public static function firstName() {
        return static::prefix() . " " .
            static::randomElement(static::$firstName);
    }
}

Since Student is not a default provider, we have to manually add it to the Faker generator. If the same method is defined on more than one provider, the latest added provider takes precedence over the others.

<?php
$faker = new FakerGenerator();
$faker->addProvider(new FakerProviderStudent($faker));
echo $faker->firstName; // invokes Student::firstName()

A More Complex Example

The built-in providers contain basic data types for testing, but real world use cases are often require more complexity. In such situations we need to create our own data providers and custom data sets to automate the testing procedure. Let’s build a Faker provider from scratch catering to a real world scenario.

Assume we’re developing an email marketing service which sends thousands of emails containing various kinds of advertisements from clients. What data fields will we need for testing? Basically we need a to email, subject, name. and content to test an email.

Let’s also assume there are three types of email templates:

advertisement with text/HTML based content
advertisements with a single full-size image
advertisements containing links to other sites

The content field will be one of these templates, so we’ll also need the testing fields text content, image, and links.

Having understood the main requirements, we can create the provider as follows:

<?php
namespace FakerProvider;
class EmailTemplate extends FakerProviderBase
{
    protected static $formats = array(
        '<p>Hello {{name}} </p>
        <p>{{text}}</p>
        <p>Newsletter by Exmaple</p>',
        '<p>{{adImage}}</p>
        <p>Newsletter by Exmaple</p>',
        '<p>Hello {{name}} </p>
        <p>{{link}}</p>
        <p>{{link}}</p>
        <p>{{link}}</p>
        <p>Newsletter by Exmaple</p>'
    );
    protected static $toEmail = array(
        "test@example.com",
        "test1@example.com"
    );
    protected static $name = array("Mark", "Adam");
    protected static $subject = array("Subject 1", "Subject 2");
    protected static $adImage = array("img1.png", "img2.jpg");
    protected static $link = array("link1", "link2");
    protected static $text = array("text1", "text2");
    public static function toEmail() {
        return static::randomElement(static::$toEmail);
    }
    public static function name() {
        return static::randomElement(static::$name);
    }
    public function template() {
        $format = static::randomElement(static::$formats);
        return $this->generator->parse($format);
    }
}

We have defined three formats to match the three different templates, and then we created data sets for each of the fields we are using in the test data generation process. All the fields should contain formatter methods similar to toEmail() and name() in the above code. The template() method takes one of the formats randomly and fills the necessary data using formatters.

We can get the test data using the code below and passing it to our email application.

<?php
$faker = new FakerGenerator();
$faker->addProvider(new FakerProviderEmailTemplate($faker));
$email = $faker->toEmail;
$subject =  $faker->subject;
$template = $faker->template;

The advantage of the above technique is that we can test all three formats randomly using a single provider with direct formatter function calling. But what if one these format methods is broken or we have a scenario where we need to test only one of the formats continuously? Commenting out or removing the formats manually isn’t an appealing option.

In this case I would recommend creating separate implementations for each format. We can define a base EmailTemplate class with one format and all of the formatter methods, and then create three different child implementations by extending it. Child classes will only contain the unique format and the formatters will be inherited from the parent class. We can then use each email template differently by loading it separately to the Faker generator.

Consistency of Test Data

Generally we’ll run tests many times and record the data and results. We check the database or log files to figure out what the respective data was when an error is encountered. Once we’ve fixed the error, it is important to run the test cases with the same data that caused the error. Faker uses seeding so we can replicate the previous data by seeding it’s random number generator.

Consider the following code:

<?php
$faker = FakerFactory::create();
$faker->seed(1000);
$faker->name;

We’ve assigned a seed value of 1000. Now, no matter how many times we execute the above script, the names will be the same sequence of random values for all the tests.

In application testing you should assign a seed for each test case and record in your logs. Once the errors are fixed, you can get the seed numbers of the test cases which caused the errors and test it again with the same data using the seed number to make it consistent.

Conclusion

Generating test data is something you should automate to prevent wasting time unnecessarily. Faker is a simple and powerful solution for generating random test data. The real power of Faker comes with its ability to extend default functionalities to suit more complex implementations.

So what is your test data generation strategy? Do you like to use Faker to automate test data generation? Let me know through the comments section.

Image via Fotolia

Frequently Asked Questions (FAQs) about Simplifying Test Data Generation with Faker

How Can I Install Faker in My Project?

Installing Faker in your project is a straightforward process. You can use Composer, a tool for dependency management in PHP. You can install it by running the command composer require fzaninotto/faker. This command will install the latest stable version of Faker and will also keep track of the version used so that all developers working on your project are using the same version.

How Can I Generate Fake Data for Different Locales?

Faker supports a multitude of locales. You can specify the locale when you create the Faker instance. For example, to generate data in French, you would use $faker = Faker\Factory::create('fr_FR');. This will generate data that is relevant and formatted correctly for the French locale.

Can I Create My Own Faker Providers?

Yes, you can create your own providers if the built-in ones do not meet your needs. To do this, you need to create a new class that extends \Faker\Provider\Base. Then, you can add your own methods for generating data. Once your provider class is ready, you can add it to the Faker generator by calling the addProvider method.

How Can I Use Faker to Generate Data for Database Testing?

Faker is an excellent tool for generating data for database testing. You can use it to create fake data for each field in your database. For example, you can use $faker->name for a name field, $faker->email for an email field, and so on. Once you have generated the data, you can insert it into your database using your preferred method.

How Can I Generate Unique Values with Faker?

Faker provides a unique() modifier that you can use to generate unique values. For example, if you want to generate a unique email address, you can use $faker->unique()->email. This will ensure that the same email address is not generated more than once.

Can I Use Faker with Laravel Factories?

Yes, Faker integrates very well with Laravel factories. When you generate a factory using the make:factory Artisan command, Laravel automatically sets up Faker for you. You can then use Faker to generate data for each field in your factory.

How Can I Generate Random Numbers with Faker?

You can use the randomNumber method to generate random numbers. By default, this method generates a random number between 0 and 2147483647. You can also specify the number of digits as a parameter to this method.

Can I Generate Fake Data in Bulk with Faker?

Yes, you can generate fake data in bulk with Faker. You can use a loop to generate multiple sets of data. For example, you can use a for loop to generate 100 names like this: for ($i=0; $i < 100; $i++) { echo $faker->name, "\n"; }.

How Can I Generate Fake Dates and Times with Faker?

Faker provides several methods for generating fake dates and times. For example, you can use $faker->dateTime to generate a random DateTime object, $faker->date to generate a date string, and $faker->time to generate a time string.

Can I Use Faker to Generate Fake Images?

Yes, you can use Faker to generate fake images. You can use the imageUrl method to get a URL of a random image. The image will be a placeholder image from LoremPixel. You can specify the width, height, and category of the image as parameters to this method.