Using Background Processing to Speed Up Page Load Times
This article is part of a series on building a sample application — a multi-image gallery blog — for performance benchmarking and optimizations. (View the repo here.)
In a previous article, we’ve added on-demand image resizing. Images are resized on the first request and cached for later use. By doing this, we’ve added some overhead to the first load; the system has to render thumbnails on the fly and is “blocking” the first user’s page render until image rendering is done.
The optimized approach would be to render thumbnails after a gallery is created. You may be thinking, “Okay, but we’ll then block the user who is creating the gallery?” Not only would it be a bad user experience, but it also isn’t a scalable solution. The user would get confused about long loading times or, even worse, encounter timeouts and/or errors if images are too heavy to be processed. The best solution is to move these heavy tasks into the background.
Background Jobs
Background jobs are the best way of doing any heavy processing. We can immediately notify our user that we’ve received their request and scheduled it for processing. The same way as YouTube does with uploaded videos: they aren’t accessible after the upload. The user needs to wait until the video is processed completely to preview or share it.
Processing or generating files, sending emails or any other non-critical tasks should be done in the background.
How Does Background Processing Work?
There are two key components in the background processing approach: job queue and worker(s). The application creates jobs that should be handled while workers are waiting and taking from the queue one job at a time.
You can create multiple worker instances (processes) to speed up processing, chop a big job up into smaller chunks and process them simultaneously. It’s up to you how you want to organize and manage background processing, but note that parallel processing isn’t a trivial task: you should take care of potential race conditions and handle failed tasks gracefully.
Our tech stack
We’re using the Beanstalkd job queue to store jobs, the Symfony Console component to implement workers as console commands and Supervisor to take care of worker processes.
If you’re using Homestead Improved, Beanstalkd and Supervisor are already installed so you can skip the installation instructions below.
Installing Beanstalkd
Beanstalkd is
a fast work queue with a generic interface originally designed for reducing the latency of page views in high-volume web applications by running time-consuming tasks asynchronously.
There are many client libraries available that you can use. In our project, we’re using Pheanstalk.
To install Beanstalkd on your Ubuntu or Debian server, simply run sudo apt-get install beanstalkd
. Take a look at the official download page to learn how to install Beanstalkd on other OSes.
Once installed, Beanstalkd is started as a daemon, waiting for clients to connect and create (or process) jobs:
/etc/init.d/beanstalkd
Usage: /etc/init.d/beanstalkd {start|stop|force-stop|restart|force-reload|status}
Install Pheanstalk as a dependency by running composer require pda/pheanstalk
.
The queue will be used for both creating and fetching jobs, so we’ll centralize queue creation in a factory service JobQueueFactory
:
<?php
namespace App\Service;
use Pheanstalk\Pheanstalk;
class JobQueueFactory
{
private $host = 'localhost';
private $port = '11300';
const QUEUE_IMAGE_RESIZE = 'resize';
public function createQueue(): Pheanstalk
{
return new Pheanstalk($this->host, $this->port);
}
}
Now we can inject the factory service wherever we need to interact with Beanstalkd queues. We are defining the queue name as a constant and referring to it when putting the job into the queue or watching the queue in workers.
Installing Supervisor
According to the official page, Supervisor is a
client/server system that allows its users to monitor and control a number of processes on UNIX-like operating systems.
We’ll be using it to start, restart, scale and monitor worker processes.
Install Supervisor on your Ubuntu/Debian server by running
sudo apt-get install supervisor
. Once installed, Supervisor will be running in the background as a daemon. Use supervisorctl
to control supervisor processes:
$ sudo supervisorctl help
default commands (type help <topic>):
=====================================
add exit open reload restart start tail
avail fg pid remove shutdown status update
clear maintail quit reread signal stop version
To control processes with Supervisor, we first have to write a configuration file and describe how we want our processes to be controlled. Configurations are stored in /etc/supervisor/conf.d/
. A simple Supervisor configuration for resize workers would look like this:
[program:resize-worker]
process_name=%(program_name)s_%(process_num)02d
command=php PATH-TO-YOUR-APP/bin/console app:resize-image-worker
autostart=true
autorestart=true
numprocs=5
stderr_logfile = PATH-TO-YOUR-APP/var/log/resize-worker-stderr.log
stdout_logfile = PATH-TO-YOUR-APP/var/log/resize-worker-stdout.log
We’re telling Supervisor how to name spawned processes, the path to the command that should be run, to automatically start and restart the processes, how many processes we want to have and where to log output. Learn more about Supervisor configurations here.
Resizing images in the background
Once we have our infrastructure set up (i.e., Beanstalkd and Supervisor installed), we can modify our app to resize images in the background after the gallery is created. To do so, we need to:
- update image serving logic in the
ImageController
- implement resize workers as console commands
- create Supervisor configuration for our workers
- update fixtures and resize images in the fixture class.
Updating Image Serving Logic
So far we’ve been resizing images on the first request: if the image file for a requested size doesn’t exist, it’s created on the fly.
We’ll now modify ImageController to return image responses for requested size only if the resized image file exists (i.e., only if the image has already been resized).
If not, the app will return a generic placeholder image response saying that the image is being resized at the moment. Note that the placeholder image response has different cache control headers, since we don’t want to cache placeholder images; we want to have the image rendered as soon as the resize process is finished.
We’ll create a simple event called GalleryCreatedEvent with Gallery ID as a payload. This event will be dispatched within the UploadController
after Gallery is successfully created:
...
$this->em->persist($gallery);
$this->em->flush();
$this->eventDispatcher->dispatch(
GalleryCreatedEvent::class,
new GalleryCreatedEvent($gallery->getId())
);
$this->flashBag->add('success', 'Gallery created! Images are now being processed.');
...
Additionally, we’ll update the flash message with “Images are now being processed.” so the user knows we still have some work to do with their images before they’re ready.
We’ll create GalleryEventSubscriber event subscriber that will react to the GalleryCreatedEvent
and request a resize job for every Image in the newly created Gallery:
public function onGalleryCreated(GalleryCreatedEvent $event)
{
$queue = $this->jobQueueFactory
->createQueue()
->useTube(JobQueueFactory::QUEUE_IMAGE_RESIZE);
$gallery = $this->entityManager
->getRepository(Gallery::class)
->find($event->getGalleryId());
if (empty($gallery)) {
return;
}
/** @var Image $image */
foreach ($gallery->getImages() as $image) {
$queue->put($image->getId());
}
}
Now, when a user successfully creates a gallery, the app will render the gallery page, but some of the images won’t be displayed as their thumbnails while still not ready:
Once workers are finished with resizing, the next refresh should render the full Gallery page.
Implement resize workers as console commands
The worker is a simple process doing the same job for every job he gets from the queue. Worker execution is blocked at the $queue->reserve()
call until either job is reserved for that worker, or a timeout happens.
Only one worker can take and process a Job. The job usually contains payload — e.g., string or serialized array/object. In our case, it’ll be UUID of a Gallery that’s created.
A simple worker looks like this:
// Construct a Pheanstalk queue and define which queue to watch.
$queue = $this->getContainer()
->get(JobQueueFactory::class)
->createQueue()
->watch(JobQueueFactory::QUEUE_IMAGE_RESIZE);
// Block execution of this code until job is added to the queue
// Optional argument is timeout in seconds
$job = $queue->reserve(60 * 5);
// On timeout
if (false === $job) {
$this->output->writeln('Timed out');
return;
}
try {
// Do the actual work here, but make sure you're catching exceptions
// and bury job so it doesn't get back to the queue
$this->resizeImage($job->getData());
// Deleting a job from the queue will mark it as processed
$queue->delete($job);
} catch (\Exception $e) {
$queue->bury($job);
throw $e;
}
You may have noticed that workers will exit after a defined timeout or when a job is processed. We could wrap worker logic in an infinite loop and have it repeat its job indefinitely, but that could cause some issues such as database connection timeouts after a long time of inactivity and make deploys harder. To prevent that, our worker lifecycle will be finished after it completes a single task. Supervisor will then restart a worker as a new process.
Take a look at ResizeImageWorkerCommand to get a clear picture how the Worker command is structured. The worker implemented in this way can also be started manually as a Symfony console command: ./bin/console app:resize-image-worker
.
Create Supervisor configuration
We want our workers to start automatically, so we’ll set an autostart=true
directive in the config.
Since the worker has to be restarted after a timeout or a successful processing task, we’ll also set an autorestart=true
directive.
The best part about background processing is the ease of parallel processing. We can set a numprocs=5
directive and Supervisor will spawn five instances of our workers. They will wait for jobs and process them independently, allowing us to scale our system easily. As your system grows, you’ll probably need to increase the number of processes. Since we’ll have multiple processes running, we need to define the structure of a process name, so we’re setting a process_name=%(program_name)s_%(process_num)02d
directive.
Last but not least, we want to store workers’ outputs so we can analyze and debug them if something goes wrong. We’ll define stderr_logfile
and stdout_logfile
paths.
The complete Supervisor configuration for our resize workers looks like this:
[program:resize-worker]
process_name=%(program_name)s_%(process_num)02d
command=php PATH-TO-YOUR-APP/bin/console app:resize-image-worker
autostart=true
autorestart=true
numprocs=5
stderr_logfile = PATH-TO-YOUR-APP/var/log/resize-worker-stderr.log
stdout_logfile = PATH-TO-YOUR-APP/var/log/resize-worker-stdout.log
After creating (or updating) the configuration file located in /etc/supervisor/conf.d/
directory, you have to tell Supervisor to re-read and update its configuration by executing the following commands:
supervisorctl reread
supervisorctl update
If you’re using Homestead Improved (and you should be!) you can use scripts/setup-supervisor.sh to generate the Supervisor configuration for this project: sudo ./scripts/setup-supervisor.sh
.
Update Fixtures
Image thumbnails won’t be rendered on the first request anymore, so we need to request rendering for every Image explicitly when we’re loading our fixtures in the LoadGalleriesData fixture class:
$imageResizer = $this->container->get(ImageResizer::class);
$fileManager = $this->container->get(FileManager::class);
...
$gallery->addImage($image);
$manager->persist($image);
$fullPath = $fileManager->getFilePath($image->getFilename());
if (false === empty($fullPath)) {
foreach ($imageResizer->getSupportedWidths() as $width) {
$imageResizer->getResizedPath($fullPath, $width, true);
}
}
Now you should feel how fixtures loading is slowed down, and that’s why we’ve moved it into the background instead of forcing our users to wait until it’s done!
Tips and Tricks
Workers are running in the background so that even after you deploy a new version of your app, you’ll have outdated workers running until they aren’t restarted for the first time.
In our case, we’d have to wait for all our workers to finish their tasks or timeout (5 minutes) until we’re sure all our workers are updated. Be aware of this when creating deploy procedures!