Building an Image Gallery Blog with Symfony Flex: Data Testing

This article is part of a series on building a sample application — a multi-image gallery blog — for performance benchmarking and optimizations. (View the repo here.)


In the previous article, we demonstrated how to set up a Symfony project from scratch with Flex, and how to create a simple set of fixtures and get the project up and running.

The next step on our journey is to populate the database with a somewhat realistic amount of data to test application performance.

--ADVERTISEMENT--

Note: if you did the “Getting started with the app” step in the previous post, you’ve already followed the steps outlined in this post. If that’s the case, use this post as an explainer on how it was done.

As a bonus, we’ll demonstrate how to set up a simple PHPUnit test suite with basic smoke tests.

More Fake Data

Once your entities are polished, and you’ve had your “That’s it! I’m done!” moment, it’s a perfect time to create a more significant dataset that can be used for further testing and preparing the app for production.

Simple fixtures like the ones we created in the previous article are great for the development phase, where loading ~30 entities is done quickly, and it can often be repeated while changing the DB schema.

Testing app performance, simulating real-world traffic and detecting bottlenecks requires bigger datasets (i.e. a larger amount of database entries and image files for this project). Generating thousands of entries takes some time (and computer resources), so we want to do it only once.

We could try increasing the COUNT constant in our fixture classes and seeing what will happen:

// src/DataFixtures/ORM/LoadUsersData.php
class LoadUsersData extends AbstractFixture implements ContainerAwareInterface, OrderedFixtureInterface
{
    const COUNT = 500;
    ...
}

// src/DataFixtures/ORM/LoadGalleriesData.php
class LoadGalleriesData extends AbstractFixture implements ContainerAwareInterface, OrderedFixtureInterface
{
    const COUNT = 1000;
    ...
}

Now, if we run bin/refreshDb.sh, after some time we’ll probably get a not-so-nice message like PHP Fatal error: Allowed memory size of N bytes exhausted.

Apart from slow execution, every error would result in an empty database because EntityManager is flushed only at the very end of the fixture class. Additionally, Faker is downloading a random image for every gallery entry. For 1,000 galleries with 5 to 10 images per gallery that would be 5,000 – 10,000 downloads, which is really slow.

There are excellent resources on optimizing Doctrine and Symfony for batch processing, and we’re going to use some of these tips to optimize fixtures loading.

First, we’ll define a batch size of 100 galleries. After every batch, we’ll flush and clear the EntityManager (i.e., detach persisted entities) and tell the garbage collector to do its job.

To track progress, let’s print out some meta information (batch identifier and memory usage).

Note: After calling $manager->clear(), all persisted entities are now unmanaged. The entity manager doesn’t know about them anymore, and you’ll probably get an “entity-not-persisted” error.

The key is to merge the entity back to the manager $entity = $manager->merge($entity);

Without the optimization, memory usage is increasing while running a LoadGalleriesData fixture class:

> loading [200] App\DataFixtures\ORM\LoadGalleriesData
100 Memory usage (currently) 24MB / (max) 24MB
200 Memory usage (currently) 26MB / (max) 26MB
300 Memory usage (currently) 28MB / (max) 28MB
400 Memory usage (currently) 30MB / (max) 30MB
500 Memory usage (currently) 32MB / (max) 32MB
600 Memory usage (currently) 34MB / (max) 34MB
700 Memory usage (currently) 36MB / (max) 36MB
800 Memory usage (currently) 38MB / (max) 38MB
900 Memory usage (currently) 40MB / (max) 40MB
1000 Memory usage (currently) 42MB / (max) 42MB

Memory usage starts at 24 MB and increases for 2 MB for every batch (100 galleries). If we tried to load 100,000 galleries, we’d need 24 MB + 999 (999 batches of 100 galleries, 99,900 galleries) * 2 MB = ~2 GB of memory.

After adding $manager->flush() and gc_collect_cycles() for every batch, removing SQL logging with $manager->getConnection()->getConfiguration()->setSQLLogger(null) and removing entity references by commenting out $this->addReference('gallery' . $i, $gallery);, memory usage becomes somewhat constant for every batch.

// Define batch size outside of the for loop
$batchSize = 100;

...

for ($i = 1; $i <= self::COUNT; $i++) {
    ...

    // Save the batch at the end of the for loop
    if (($i % $batchSize) == 0 || $i == self::COUNT) {
        $currentMemoryUsage = round(memory_get_usage(true) / 1024);
        $maxMemoryUsage = round(memory_get_peak_usage(true) / 1024);
        echo sprintf("%s Memory usage (currently) %dKB/ (max) %dKB \n", $i, $currentMemoryUsage, $maxMemoryUsage);

        $manager->flush();
        $manager->clear();

        // here you should merge entities you're re-using with the $manager
        // because they aren't managed anymore after calling $manager->clear();
        // e.g. if you've already loaded category or tag entities
        // $category = $manager->merge($category);

        gc_collect_cycles();
    }
}

As expected, memory usage is now stable:

> loading [200] App\DataFixtures\ORM\LoadGalleriesData
100 Memory usage (currently) 24MB / (max) 24MB
200 Memory usage (currently) 26MB / (max) 28MB
300 Memory usage (currently) 26MB / (max) 28MB
400 Memory usage (currently) 26MB / (max) 28MB
500 Memory usage (currently) 26MB / (max) 28MB
600 Memory usage (currently) 26MB / (max) 28MB
700 Memory usage (currently) 26MB / (max) 28MB
800 Memory usage (currently) 26MB / (max) 28MB
900 Memory usage (currently) 26MB / (max) 28MB
1000 Memory usage (currently) 26MB / (max) 28MB

Instead of downloading random images every time, we can prepare 15 random images and update the fixture script to randomly choose one of them instead of using Faker’s $faker->image() method.

Let’s take 15 images from Unsplash and save them in var/demo-data/sample-images.

Then, update the LoadGalleriesData::generateRandomImage method:

private function generateRandomImage($imageName)
    {
        $images = [
            'image1.jpeg',
            'image10.jpeg',
            'image11.jpeg',
            'image12.jpg',
            'image13.jpeg',
            'image14.jpeg',
            'image15.jpeg',
            'image2.jpeg',
            'image3.jpeg',
            'image4.jpeg',
            'image5.jpeg',
            'image6.jpeg',
            'image7.jpeg',
            'image8.jpeg',
            'image9.jpeg',
        ];

        $sourceDirectory = $this->container->getParameter('kernel.project_dir') . '/var/demo-data/sample-images/';
        $targetDirectory = $this->container->getParameter('kernel.project_dir') . '/var/uploads/';

        $randomImage = $images[rand(0, count($images) - 1)];
        $randomImageSourceFilePath = $sourceDirectory . $randomImage;
        $randomImageExtension = explode('.', $randomImage)[1];
        $targetImageFilename = sha1(microtime() . rand()) . '.' . $randomImageExtension;
        copy($randomImageSourceFilePath, $targetDirectory . $targetImageFilename);

        $image = new Image(
            Uuid::getFactory()->uuid4(),
            $randomImage,
            $targetImageFilename
        );

        return $image;
    }

It’s a good idea to remove old files in var/uploads when reloading fixtures, so I’m adding rm var/uploads/* command to bin/refreshDb.sh script, immediately after dropping the DB schema.

Loading 500 users and 1000 galleries now takes ~7 minutes and ~28 MB of memory (peak usage).

Dropping database schema...
Database schema dropped successfully!
ATTENTION: This operation should not be executed in a production environment.

Creating database schema...
Database schema created successfully!
  > purging database
  > loading [100] App\DataFixtures\ORM\LoadUsersData
300 Memory usage (currently) 10MB / (max) 10MB
500 Memory usage (currently) 12MB / (max) 12MB
  > loading [200] App\DataFixtures\ORM\LoadGalleriesData
100 Memory usage (currently) 24MB / (max) 26MB
200 Memory usage (currently) 26MB / (max) 28MB
300 Memory usage (currently) 26MB / (max) 28MB
400 Memory usage (currently) 26MB / (max) 28MB
500 Memory usage (currently) 26MB / (max) 28MB
600 Memory usage (currently) 26MB / (max) 28MB
700 Memory usage (currently) 26MB / (max) 28MB
800 Memory usage (currently) 26MB / (max) 28MB
900 Memory usage (currently) 26MB / (max) 28MB
1000 Memory usage (currently) 26MB / (max) 28MB

Take a look at the fixture classes source: LoadUsersData.php and LoadGalleriesData.php.

Performance

At this point, the homepage rendering is very slow — way too slow for production.

A user can feel that the app is struggling to deliver the page, probably because the app is rendering all the galleries instead of a limited number.

Instead of rendering all galleries at once, we could update the app to render only the first 12 galleries immediately and introduce lazy load. When the user scrolls to the end of the screen, the app will fetch next 12 galleries and present them to the user.

Performance tests

To track performance optimization, we need to establish a fixed set of tests that will be used to test and benchmark performance improvements relatively.

We will use Siege for load testing. Here you can find more about Siege and performance testing. Instead of installing Siege on my machine, we can utilize Docker — a powerful container platform.

In simple terms, Docker containers are similar to virtual machines (but they aren’t the same thing). Except for building and deploying apps, Docker can be used to experiment with applications without actually installing them on your local machine. You can build your images or use images available on Docker Hub, a public registry of Docker images.

It’s especially useful when you want to experiment with different versions of the same software (for example, different versions of PHP).

We’ll use the yokogawa/siege image to test the app.

Testing the home page

Testing the home page is not trivial, since there are Ajax requests executed only when the user scrolls to the end of the page.

We could expect all users to land on the home page (i.e., 100%). We could also estimate that 50% of them would scroll down to the end and therefore request the second page of galleries. We could also guess that 30% of them would load the third page, 15% would request the fourth page, and 5% would request the fifth page.

These numbers are based on predictions, and it would be much better if we could use an analytics tool to get an actual insight in users’ behavior. But that’s impossible for a brand new app. Still, it’s a good idea to take a look at analytics data now and then and adjust your test suite after the initial deploy.

We’ll test the home page (and lazy load URLs) with two tests running in parallel. The first one will be testing the home page URL only, while another one will test lazy load endpoint URLs.

File lazy-load-urls.txt contains a randomized list of lazily loaded pages URLs in predicted ratios:

  • 10 URLs for the second page (50%)
  • 6 URLs for third page (30%)
  • 3 URLs for fourth page (15%)
  • 1 URLs for fifth page (5%)
http://blog.app/galleries-lazy-load?page=2
http://blog.app/galleries-lazy-load?page=2
http://blog.app/galleries-lazy-load?page=2
http://blog.app/galleries-lazy-load?page=4
http://blog.app/galleries-lazy-load?page=2
http://blog.app/galleries-lazy-load?page=2
http://blog.app/galleries-lazy-load?page=3
http://blog.app/galleries-lazy-load?page=2
http://blog.app/galleries-lazy-load?page=2
http://blog.app/galleries-lazy-load?page=4
http://blog.app/galleries-lazy-load?page=2
http://blog.app/galleries-lazy-load?page=4
http://blog.app/galleries-lazy-load?page=2
http://blog.app/galleries-lazy-load?page=3
http://blog.app/galleries-lazy-load?page=3
http://blog.app/galleries-lazy-load?page=3
http://blog.app/galleries-lazy-load?page=5
http://blog.app/galleries-lazy-load?page=3
http://blog.app/galleries-lazy-load?page=2
http://blog.app/galleries-lazy-load?page=3

The script for testing homepage performance will run 2 Siege processes in parallel, one against home page and another one against a generated list of URLs.

To execute a single HTTP request with Siege (in Docker), run:

docker run --rm -t yokogawa/siege -c1 -r1 blog.app

Note: if you aren’t using Docker, you can omit the docker run --rm -t yokogawa/siege part and run Siege with the same arguments.

To run a 1-minute test with 50 concurrent users against the home page with a 1-second delay, execute:

docker run --rm -t yokogawa/siege -d1 -c50 -t1M http://blog.app

To run a 1-minute test with 50 concurrent users against URLs in lazy-load-urls.txt, execute:

docker run --rm -v `pwd`:/var/siege:ro -t yokogawa/siege -i --file=/var/siege/lazy-load-urls.txt -d1 -c50 -t1M

Do this from the directory where your lazy-load-urls.txt is located (that directory will be mounted as a read-only volume in Docker).

Running a script test-homepage.sh will start 2 Siege processes (in a way suggested by this Stack Overflow answer) and output results.

Assume we’ve deployed the app on a server with Nginx and with PHP-FPM 7.1 and loaded 25,000 users and 30,000 galleries. The results from load testing the app home page are:

./test-homepage.sh

Transactions:               499 hits
Availability:               100.00 %
Elapsed time:               59.10 secs
Data transferred:           1.49 MB
Response time:              4.75 secs
Transaction rate:           8.44 trans/sec
Throughput:                 0.03 MB/sec
Concurrency:                40.09
Successful transactions:    499
Failed transactions:        0
Longest transaction:        16.47
Shortest transaction:       0.17

Transactions:               482 hits
Availability:               100.00 %
Elapsed time:               59.08 secs
Data transferred:           6.01 MB
Response time:              4.72 secs
Transaction rate:           8.16 trans/sec
Throughput:                 0.10 MB/sec
Concurrency:                38.49
Successful transactions:    482
Failed transactions:        0
Longest transaction:        15.36
Shortest transaction:       0.15

Even though app availability is 100% for both home page and lazy-load tests, response time is ~5 seconds, which is not something we’d expect from a high-performance app.

Testing a single gallery page

Testing a single gallery page is a little bit simpler: we’ll run Siege against the galleries.txt file, where we have a list of single gallery page URLs to test.

From the directory where the galleries.txt file is located (that directory will be mounted as a read-only volume in Docker), run this command:

docker run --rm -v `pwd`:/var/siege:ro -t yokogawa/siege -i --file=/var/siege/galleries.txt -d1 -c50 -t1M

Load test results for single gallery pages are somewhat better than for the home page:

./test-single-gallery.sh
** SIEGE 3.0.5
** Preparing 50 concurrent users for battle.
The server is now under siege...
Lifting the server siege...      done.

Transactions:               3589 hits
Availability:               100.00 %
Elapsed time:               59.64 secs
Data transferred:           11.15 MB
Response time:              0.33 secs
Transaction rate:           60.18 trans/sec
Throughput:                 0.19 MB/sec
Concurrency:                19.62
Successful transactions:    3589
Failed transactions:        0
Longest transaction:        1.25
Shortest transaction:       0.10

Tests, Tests, Tests

To make sure we’re not breaking anything with improvements we implement in the future, we need at least some tests.

First, we require PHPUnit as a dev dependency:

composer req --dev phpunit

Then we’ll create a simple PHPUnit configuration by copying phpunit.xml.dist created by Flex to phpunit.xml and update environment variables (e.g., DATABASE_URL variable for the test environment). Also, I’m adding phpunit.xml to .gitignore.

Next, we create basic functional/smoke tests for the blog home page and single gallery pages. Smoke testing is a “preliminary testing to reveal simple failures severe enough to reject a prospective software release”. Since it’s quite easy to implement smoke tests, there’s no valid reason why you should avoid them!

These tests would only assert that URLs you provide in the urlProvider() method are resulting in a successful HTTP response code (i.e., HTTP status code is 2xx or 3xx).

Simple smoke testing the home page and five single gallery pages could look like this:

namespace App\Tests;

use App\Entity\Gallery;
use Psr\Container\ContainerInterface;
use Symfony\Bundle\FrameworkBundle\Test\WebTestCase;
use Symfony\Component\Routing\RouterInterface;

class SmokeTest extends WebTestCase
{
    /** @var  ContainerInterface */
    private $container;

    /**
     * @dataProvider urlProvider
     */
    public function testPageIsSuccessful($url)
    {
        $client = self::createClient();
        $client->request('GET', $url);

        $this->assertTrue($client->getResponse()->isSuccessful());
    }

    public function urlProvider()
    {
        $client = self::createClient();
        $this->container = $client->getContainer();

        $urls = [
            ['/'],
        ];

        $urls += $this->getGalleriesUrls();

        return $urls;
    }

    private function getGalleriesUrls()
    {
        $router = $this->container->get('router');
        $doctrine = $this->container->get('doctrine');
        $galleries = $doctrine->getRepository(Gallery::class)->findBy([], null, 5);

        $urls = [];

        /** @var Gallery $gallery */
        foreach ($galleries as $gallery) {
            $urls[] = [
                '/' . $router->generate('gallery.single-gallery', ['id' => $gallery->getId()],
                    RouterInterface::RELATIVE_PATH),
            ];
        }

        return $urls;
    }

}

Run ./vendor/bin/phpunit and see if tests are passing:

./vendor/bin/phpunit
PHPUnit 6.5-dev by Sebastian Bergmann and contributors.

...

5 / 5 (100%)

Time: 4.06 seconds, Memory: 16.00MB

OK (5 tests, 5 assertions)

Note that it’s better to hardcode important URLs (e.g., for static pages or some well-known URLs) than to generate them within the test. Learn more about PHPUnit and TDD here.

Stay Tuned

Upcoming articles in this series will cover details about PHP and MySQL performance optimization, improving overall performance perception and other tips and tricks for better app performance.

Sponsors