Writing Async Libraries – Let’s Convert HTML to PDF

Share this article

Writing Async Libraries – Let’s Convert HTML to PDF

This article was peer reviewed by Thomas Punt. Thanks to all of SitePoint’s peer reviewers for making SitePoint content the best it can be!


I can barely remember a conference where the topic of asynchronous PHP wasn’t discussed. I am pleased that it’s so frequently spoken about these days. There’s a secret these speakers aren’t telling, though…

Making asynchronous servers, resolving domain names, interacting with file systems: these are the easy things. Making your own asynchronous libraries is hard. And it’s where you spend most of your time!

Vector image of parallel racing arrows, indicating multi-process execution

The reason those easy things are easy is because they were the proof of concept – to make async PHP competitive with NodeJS. You can see this in how similar their early interfaces were:

var http = require("http");
var server = http.createServer();

server.on("request", function(request, response) {
    response.writeHead(200, {
        "Content-Type": "text/plain"
    });

    response.end("Hello World");
});

server.listen(3000, "127.0.0.1");

This code was tested with Node 7.3.0

require "vendor/autoload.php";

$loop = React\EventLoop\Factory::create();
$socket = new React\Socket\Server($loop);
$server = new React\Http\Server($socket);

$server->on("request", function($request, $response) {
    $response->writeHead(200, [
        "Content-Type" => "text/plain"
    ]);

    $response->end("Hello world");
});

$socket->listen(3000, "127.0.0.1");
$loop->run();

This code was tested with PHP 7.1 and react/http:0.4.2

Today, we’re going to look at a few ways to make your application code work well in an asynchronous architecture. Fret not – your code can still work in a synchronous architecture, so you don’t have to give anything up to learn this new skill. Apart from a bit of time…

You can find the code for this tutorial on Github. I’ve tested it with PHP 7.1 and the most recent versions of ReactPHP and Amp.

Promising Theory

There are a few abstractions common to asynchronous code. We’ve already seen one of them: callbacks. Callbacks, by their very name, describe how they treat slow or blocking operations. Synchronous code is fraught with waiting. Ask for something, wait for that thing to happen.

So, instead, asynchronous frameworks and libraries can employ callbacks. Ask for something, and when it happens: the framework or library will call your code back.

In the case of HTTP servers, we don’t preemptively handle all requests. We don’t wait around for requests to happen, either. We simply describe the code that should be called, should a request happen. The event loop takes care of the rest.

A second common abstraction is promises. Where callbacks are hooks waiting for future events, promises are references to future values. They look something like this:

readFile()
    ->then(function(string $content) {
        print "content: " . $content;
    })
    ->catch(function(Exception $e) {
        print "error: " . $e->getMessage();
    });

It’s a bit more code than callbacks alone, but it’s an interesting approach. We wait for something to happen, and then do another thing. If something goes wrong, we catch the error and respond sensibly. This may look simple, but it’s not spoken about nearly enough.

We’re still using callbacks, but we’ve wrapped them in an abstraction which helps us in other ways. One such benefit is that they allow multiple resolution callbacks…

$promise = readFile();
$promise->then(...)->catch(...);

// ...let's add logging to existing code

$promise->then(function(string $content) use ($logger) {
    $logger->info("file was read");
});

There’s something else I’d like us to focus on. It’s that promises provide a common language – a common abstraction – for thinking about how synchronous code can become asynchronous code.

Let’s take some application code and make it asynchronous, using promises…

Making PDF Files

It’s common for applications to generate some kind of summary document – be it an invoice or stock list. Imagine you have an e-commerce application which processes payments through Stripe. When customers purchase something, you’d like them to be able to download a PDF receipt of that transaction.

There are many ways you could do this, but a really simple approach would be to generate the document using HTML and CSS. You could convert that to a PDF document, and allow the customer to download it.

I needed to do something similar recently. I discovered that there aren’t many good libraries that support this kind of operation. I couldn’t find a single abstraction which would allow me to switch between different HTML → PDF engines. So I started to build my own.

I began thinking about what I needed the abstraction to do. I settled on an interface quite like:

interface Driver
{
    public function html($html = null);
    public function size($size = null);
    public function orientation($orientation = null);
    public function dpi($dpi = null);
    public function render();
}

For the sake of simplicity, I wanted all but the render method to function as both getters and setters. Given this set of expected methods, the next thing to do was to create an implementation, using one possible engine. I added domPDF to my project, and set about using it:

class DomDriver extends BaseDriver implements Driver
{
    private $options;

    public function __construct(array $options = [])
    {
        $this->options = $options;
    }

    public function render()
    {
        $data = $this->data();
        $custom = $this->options;

        return $this->parallel(
            function() use ($data, $custom) {
                $options = new Options();

                $options->set(
                    "isJavascriptEnabled", true
                );

                $options->set(
                    "isHtml5ParserEnabled", true
                );

                $options->set("dpi", $data["dpi"]);

                foreach ($custom as $key => $value) {
                    $options->set($key, $value);
                }

                $engine = new Dompdf($options);

                $engine->setPaper(
                    $data["size"], $data["orientation"]
                );

                $engine->loadHtml($data["html"]);
                $engine->render();

                return $engine->output();
            }
        );
    }
}

I’m not going to go into the specifics of how to use domPDF. I think the docs do a good enough job of that, allowing me to focus on the async bits of this implementation.

We’ll look at the data and parallel methods in a bit. What’s important about this Driver implementation is that it gathers the data (if any have been set, otherwise defaults) and custom options together. It passes these to a callback we’d like to be run asynchronously.

domPDF isn’t an asynchronous library, and converting HTML → PDF is a notoriously slow process. So how do we make it asynchronous? Well, we could write a completely asynchronous converter, or we could use an existing synchronous converter; but run it in a parallel thread or process.

That’s what I made the parallel method for:

abstract class BaseDriver implements Driver
{
    protected $html = "";
    protected $size = "A4";
    protected $orientation = "portrait";
    protected $dpi = 300;

    public function html($body = null)
    {
        return $this->access("html", $html);
    }

    private function access($key, $value = null)
    {
        if (is_null($value)) {
            return $this->$key;
        }

        $this->$key = $value;
        return $this;
    }

    public function size($size = null)
    {
        return $this->access("size", $size);
    }

    public function orientation($orientation = null)
    {
        return $this->access("orientation", $orientation);
    }

    public function dpi($dpi = null)
    {
        return $this->access("dpi", $dpi);
    }

    protected function data()
    {
        return [
            "html" => $html,
            "size" => $this->size,
            "orientation" => $this->orientation,
            "dpi" => $this->dpi,
        ];
    }

    protected function parallel(Closure $deferred)
    {
        // TODO
    }
}

Here I implemented the getter-setter methods, figuring that I could reuse them for the next implementation. The data method acts as shortcut for collecting various document properties into an array, making them easier to pass to anonymous functions.

The parallel method started to get interesting:

use Amp\Parallel\Forking\Fork;
use Amp\Parallel\Threading\Thread;

// ...

protected function parallel(Closure $deferred)
{
    if (Fork::supported()) {
       return Fork::spawn($deferred)->join();
    }

    if (Thread::supported()) {
        return Thread::spawn($deferred)->join();
    }

    return null;
}

I’m a huge fan of the Amp project. It’s a collection of libraries supporting asynchronous architecture, and they’re key supporters of the async-interop project.

One of their libraries is called amphp/parallel, and it supports multi-threaded and multi-process code (via Pthreads and Process Control extensions). Those spawn methods return Amp’s implementation of promises. That means the render method can be used like any other promise-returning method:

$promise = $driver
    ->html("<h1>hello world</h1>")
    ->size("A4")->orientation("portrait")->dpi(300)
    ->render();

$results = yield $promise;

This code is a bit loaded. Amp also provides an event loop implementation and all the helper code to be able to convert ordinary PHP generators to coroutines and promises. You can read about how this is even possible, and what it has to do with PHP’s generators in another post I’ve written.

The returned promises are also becoming standardized. Amp returns implementations of the Promise spec. It deviates slightly from the code I showed above, but still performs the same function.

Generators work like coroutines from languages that have them. Coroutines are interruptible functions, which means they can be used to do short bursts of work, and then pause while they wait for something. While paused, other functions can use the system resources.

In practice, this looks like:

use AsyncInterop\Loop;

Loop::execute(
    Amp\wrap(function() {
        $result = yield funcReturnsPromise();
    })
);

This looks way more complicated than just writing synchronous code to begin with. But what it allows for is that other things can happen while we would otherwise be waiting for funcReturnsPromise to complete.

Yielding promises is that abstraction I was talking about. It gives us the framework by which we can make functions that return promises. Code can interact with those promises in predictable and understandable ways.

Look at what it would be like to render PDF documents using our driver:

use AsyncInterop\Loop;

Loop::execute(Amp\wrap(function() {
    $driver = new DomDriver();

    // this is an AsyncInterop\Promise...
    $promise = $driver
        ->body("<h1>hello world</h1>")
        ->size("A4")->orientation("portrait")->dpi(300)
        ->render();

    $results = yield $promise;

    // write $results to an empty PDF file
}));

This is less useful than, say, generating PDFs in an asynchronous HTTP server. There’s an Amp library called Aerys which makes these kinds of servers easier to create. Using Aerys, you could create the following HTTP server code:

$router = new Aerys\Router();

$router->get("/", function($request, $response) {
    $response->end("<h1>Hello World!</h1>");
});

$router->get("/convert", function($request, $response) {
    $driver = new DomDriver();

    // this is an AsyncInterop\Promise...
    $promise = $driver
        ->body("<h1>hello world</h1>")
        ->size("A4")->orientation("portrait")->dpi(300)
        ->render();

    $results = yield $promise;

    $response
        ->setHeader("Content-type", "application/pdf")
        ->end($results);
});

(new Aerys\Host())
    ->expose("127.0.0.1", 3000)
      ->use($router);

Again, I’m not going to go into the details of Aerys now. It’s an impressive bit of software, well deserving of it’s own post. You don’t need to understand how Aerys works in order to see how natural our converter’s code looks alongside it.

My Boss Says “No Async!”

Why go through all this trouble, if you’re unsure how often you’ll be able to build asynchronous applications? Writing this code gives us valuable insight into a new programming paradigm. And, just because we’re writing this code as asynchronous doesn’t mean it can’t work in synchronous environments.

To use this code in a synchronous application, all we need to do is move some of the asynchronous code inside:

use AsyncInterop\Loop;

class SyncDriver implements Driver
{
    private $decorated;

    public function __construct(Driver $decorated)
    {
        $this->decorated = $decorated;
    }

    // ...proxy getters/setters to $decorated

    public function render()
    {
        $result = null;

        Loop::execute(
            Amp\wrap(function() use (&$result) {
                $result = yield $this->decorated
                    ->render();
            })
        );

        return $result;
    }
}

Using this decorator, we can write what appears to be synchronous code:

$driver = new DomDriver();

// this is a string...
$results = $driver
    ->body("<h1>hello world</h1>")
    ->size("A4")->orientation("portrait")->dpi(300)
    ->render();

// write $results to an empty PDF file

It’s still running the code asynchronously (in the background at least), but none of that is exposed to the consumer. You could use this in a synchronous application, and never know what was going on under the hood.

Supporting Other Frameworks

Amp has a particular set of requirements that make it unsuitable for all environments. For example, the base Amp (event loop) library requires PHP 7.0. The parallel library requires the Pthreads extension or the Process Control extension.

I didn’t want to impose these restrictions on everyone, and wondered how I could support a wider range of systems. The answer was to abstract the parallel execution code into another driver system:

interface Runner
{
    public function run(Closure $deferred);
}

I could implement this for Amp as well as for the (less restrictive, albeit much older) ReactPHP:

use React\ChildProcess\Process;
use SuperClosure\Serializer;

class ReactRunner implements Runner
{
    public function run(Closure $deferred)
    {
        $autoload = $this->autoload();

        $serializer = new Serializer();

        $serialized = base64_encode(
            $serializer->serialize($deferred)
        );

        $raw = "
            require_once '{$autoload}';

            \$serializer = new SuperClosure\Serializer();
            \$serialized = base64_decode('{$serialized}');

            return call_user_func(
                \$serializer->unserialize(\$serialized)
            );
        ";

        $encoded = addslashes(base64_encode($raw));

        $code = sprintf(
            "print eval(base64_decode('%s'));",
            $encoded
        );

        return new Process(sprintf(
            "exec php -r '%s'",
            addslashes($code)
        ));
    }

    private function autoload()
    {
        $dir = __DIR__;
        $suffix = "vendor/autoload.php";

        $path1 = "{$dir}/../../{$suffix}";
        $path2 = "{$dir}/../../../../{$suffix}";

        if (file_exists($path1)) {
            return realpath($path1);
        }

        if (file_exists($path2)) {
            return realpath($path2);
        }
    }
}

I’m used to passing around closures to multi-threaded and multi-process workers, because that’s how Pthreads and Process Control work. Using ReactPHP Process objects is entirely different as they rely on exec for multi-process execution. I decided to implement the same closure functionality I was used to. This isn’t essential to asynchronous code – it’s purely an expression of taste.

The SuperClosure library serializes closures and their bound variables. Most of the code here is what you’d expect to find inside a worker script. In fact, the only way (apart from serializing closures) to use ReactPHP’s child process library is to send tasks to a worker script.

Now, instead of loading our drivers with $this->parallel and Amp-specific code, we can pass runner implementations around. As async code, this resembles:

use React\EventLoop\Factory;

$driver = new DomDriver();

$runner = new ReactRunner();

// this is a React\ChildProcess\Process...
$process = $driver
    ->body("<h1>hello world</h1>")
    ->size("A4")->orientation("portrait")->dpi(300)
    ->render($runner);

$loop = Factory::create();

$process->on("exit", function() use ($loop) {
    $loop->stop();
});

$loop->addTimer(0.001, function($timer) use ($process) {
    $process->start($timer->getLoop());

    $process->stdout->on("data", function($results) {
        // write $results to an empty PDF file
    });
});

$loop->run();

Don’t be alarmed by how different this ReactPHP code looks from the Amp code. ReactPHP doesn’t implement the same coroutine foundation as Amp does. Instead, ReactPHP favors callbacks for most things. This code is still just running the PDF conversion in parallel, and returning the resulting PDF data.

With runners abstracted, we can use any asynchronous framework we’d like, and we can expect the abstractions of that framework to be returned by the driver we’re using.

Can I use This?

What started out as an experiment became a multi-driver, multi-runner HTML → PDF library; called Paper. It’s like the HTML → PDF equivalent of Flysystem, but it’s also a good example of how to write asynchronous libraries.

As you try to make async PHP applications, you’re going to find gaps in the library ecosystem. Don’t be discouraged by these! Instead, take the opportunity to think about how you’d make your own asynchronous libraries, using the abstractions ReactPHP and Amp provide.

Have you built an interesting async PHP application or library recently? Let us know in the comments.

Frequently Asked Questions (FAQs) on Converting HTML to PDF Asynchronously

What is the significance of asynchronous programming in converting HTML to PDF?

Asynchronous programming plays a crucial role in converting HTML to PDF. It allows the execution of non-blocking operations, meaning the engine runs in the background, allowing the rest of your code to continue executing while the asynchronous operation completes. This results in a more efficient use of resources and improved performance, especially in applications that involve heavy I/O operations, such as converting HTML to PDF.

How does ReactPHP help in creating asynchronous libraries?

ReactPHP is a low-level library for event-driven programming in PHP. It provides the core infrastructure for creating asynchronous libraries in PHP. With ReactPHP, you can write non-blocking code using PHP’s familiar syntax, making it easier to create high-performance applications.

What are the steps involved in converting HTML to PDF asynchronously?

The process of converting HTML to PDF asynchronously involves several steps. First, you need to set up an HTML template that defines the structure and content of the PDF. Next, you use an asynchronous library like ReactPHP to handle the conversion process. This involves reading the HTML file, converting it to PDF, and then saving the resulting PDF file. The asynchronous nature of this process means that your application can continue performing other tasks while the conversion is taking place.

Can I use other languages besides PHP for asynchronous programming?

Yes, you can use other languages for asynchronous programming. Node.js, for example, is a popular choice for building asynchronous applications due to its event-driven architecture. However, if you’re already familiar with PHP, libraries like ReactPHP make it easy to leverage the benefits of asynchronous programming without having to learn a new language.

How can I handle errors during the asynchronous conversion of HTML to PDF?

Error handling is an important aspect of asynchronous programming. In ReactPHP, you can handle errors by attaching an error event handler to the Promise object. This handler will be called if an error occurs during the conversion process, allowing you to log the error or take other appropriate action.

What are the benefits of converting HTML to PDF?

Converting HTML to PDF has several benefits. It allows you to create a static, portable version of a webpage that can be viewed offline, printed, or shared easily. PDFs also maintain the formatting and layout of the original HTML, ensuring that the content looks the same regardless of the device or platform it’s viewed on.

How can I optimize the performance of my asynchronous PHP application?

There are several ways to optimize the performance of your asynchronous PHP application. One approach is to use a library like ReactPHP, which provides a low-level interface for event-driven programming. This allows you to write non-blocking code, which can significantly improve the performance of I/O-heavy operations like converting HTML to PDF.

Can I convert HTML to PDF synchronously?

Yes, it’s possible to convert HTML to PDF synchronously. However, this approach can block the execution of your application until the conversion process completes, which can lead to performance issues in I/O-heavy applications. Asynchronous conversion, on the other hand, allows your application to continue executing other tasks while the conversion takes place, resulting in better performance and resource utilization.

What are the challenges of asynchronous programming in PHP?

Asynchronous programming in PHP can be challenging due to the language’s synchronous nature. However, libraries like ReactPHP provide the infrastructure needed to write non-blocking code in PHP. Understanding the event-driven programming model and mastering the use of Promises can also be challenging, but they are key to leveraging the benefits of asynchronous programming.

How can I test the performance of my asynchronous PHP application?

Testing the performance of your asynchronous PHP application involves measuring key metrics like response time, memory usage, and CPU utilization under different load conditions. Tools like Apache JMeter or Siege can be used to simulate load on your application and collect performance data. Additionally, profiling tools like Xdebug can help you identify bottlenecks in your code and optimize its performance.

Christopher PittChristopher Pitt
View Author

Christopher is a writer and coder, working at Over. He usually works on application architecture, though sometimes you'll find him building compilers or robots.

ampAsyncasync functionsasynchronousasynchronous codeBrunoSOOPHPparallel computingpdfpdf generationPHPpthreadsreactphp
Share this article
Read Next
Get the freshest news and resources for developers, designers and digital creators in your inbox each week