Reducing HTTP requests with generated data URIs

Key Takeaways

Reducing the number of HTTP requests a page makes for its images can be achieved by pre-processing the source-code and converting them to data URIs. This method is supported by all modern browsers and can potentially reduce the number of requests to zero.
A demo page is provided in the article that demonstrates how to implement this process using PHP. The process involves using the output buffer to pre-compile the output source, then parsing it again before sending it to the browser.
There are potential costs to this method. Images as data URIs are one-third larger than their original size and won’t cache as images. There is also a processing overhead for each conversion. Therefore, this technique may be best suited for large numbers of small images, like icons and background slices.
It would be beneficial to develop an algorithm that calculates the cost vs. benefit of implementing this technique in different situations. Such an algorithm would take into account the size and complexity of the page, the number and size of each image, and the time it takes to convert and encode, compared with an average network request.

I’m not really a server-side guy, but I do dabble here and there, and the other day I had this really neat idea I’d like to share. Of course it might be old-hat to you experienced PHP programmers! But then I hope you’ll be interested in my implementation, and who knows — maybe I’m about to make somebody’s day!

The idea is this: you can reduce the number of HTTP requests that a page has to make for its images, by pre-processing the source-code and converting them to data URIs. In fact, as long as the total amount of data involved doesn’t threaten PHP’s memory limit, you can reduce the number to zero!

The data URI scheme is a means of including data in web-pages as though it were an external resource. It can be used for any kind of data, including images, scripts and stylesheets, and is supported in all modern browsers: Gecko browsers like Firefox and Camino; Webkit browsers like Safari, Konqueror and Chrome; Opera, of course; and IE8 in a limited fashion (but not IE7 or earlier).

As Google soon atested though, I’m not the first to have had the idea of using them for page-optimization. But the implementations I saw all revolved around re-writing image paths manually, to point them to scripting, something like this:


<img src="<?php echo data_uri('images/darwinfish.png'); ?>" alt="Darwin Fish" />

What I’m proposing is a retrospective process that converts all the image paths for you, so you don’t have to do anything special when you’re authoring the page in the first place.

Code is where the heart is

The following example is a complete demo page, with original HTML and CSS, surrounded by PHP.

view this example online (and use Firebug to check for HTTP requests)
download a demo zipfile

The page contains five <img> elements and one CSS background-image, yet in supported browsers it makes no additional HTTP requests at all:

<?php 
if($datauri_supported = preg_match("/(Opera|Gecko|MSIE 8)/", $_SERVER['HTTP_USER_AGENT'])) 
{ 
   ob_start(); 
}
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "https://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="https://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
   
   <title>Data URI Generator</title>
   <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
   
   <style type="text/css">
   
      body
      {
         background:url(images/texture.jpeg) #e2e2dc repeat;
         color:#554;
      }   
      
   </style>
   
</head>
<body>
   
   <p>
      <img src="images/dropcap.jpg" alt="dropcap.jpg" />
      <img src="images/firefox.png" alt="firefox.png" />
      <img src='images/specificity.jpg' alt='specificity.jpg' />
      <img src='images/darwinfish.png' alt='darwinfish.png' />
      <img src="images/rolleyes.gif" alt="rolleyes.gif" />
   </p>
   
</body>
</html>
<?php
   
if($datauri_supported)
{
   function create_data_uri($matches)
   {
      $filetype = explode('.', $matches[2]);
      $filetype = strtolower($filetype[count($filetype) - 1]);
      
      if(!preg_match('/^(gif|png|jp[e]?g|bmp)$/i', $filetype))
      {
         return $matches[0];
      }
      
      if(preg_match('/^//', $matches[2]))
      {
         $matches[2] = $_SERVER['DOCUMENT_ROOT'] . $matches[2];
      }
   
      @$data = base64_encode(file_get_contents($matches[2]));
   
      return $matches[1] . "data:image/$filetype;base64,$data" . $matches[3];
   }
   
   
   $html = ob_get_contents();
   ob_end_clean();
   
   $html = preg_split("/r?n|r/", $html);
   while(count($html) > 0)
   {
      $html[0] = preg_replace_callback("/(src=["'])([^"']+)(["'])/", 'create_data_uri', $html[0]);
      $html[0] = preg_replace_callback("/(url(['"]?)([^"')]+)(["']?))/", 'create_data_uri', $html[0]);
   
      echo $html[0] . "rn";
   
      array_shift($html);
   }
}
   
?>

How this all works

The heart of all this is the ability to build data URIs using base64-encoded image data.

But over and above that, there’s a couple of key tricks we need to make this all work. Firstly, using the output buffer to pre-compile the output source, so we have a chance to parse it again before sending it to the browser.

You’ll see how, at the very top of the code, I’ve set a browser condition to decide whether to start the output buffer. That same condition is then used again, surrounding the main code at the bottom, so for browsers that don’t support this technique, all the scripting is bypassed and it just outputs the page as normal. Then what I’ve done is split the HTML by its line-breaks, so we can process, output then delete each line immediately, which avoids having to hold the entire source-code in memory.

Secondly, to implement the actual parsing I’ve used the preg_replace_callback function, which identifies HTML and CSS paths with a pair of regular-expressions, and passes them through a process too complex for a simple replacement. (We have to look for src attributes and url properties separately, because the syntax is too different for a single regex to generate identical match arrays.)

Within the callback function we first have to work out the file-type, which is needed for the output data, and also as a condition for allowed types, so we can reject anything that’s not an image (such as a script src). The $matches arrays that’s passed to the function always contains the entire substring match as its first member (followed by backreferences from [1]), so if we identify a file we don’t want we can just return that first match unmodified, and we’re done.

The only other thing to do after that is check for web-root paths, that will need prepending with DOCUMENT_ROOT to create a usable file path. Once we have all that, we can grab and encode the image (with error-suppression in case the original path was broken), then compile and return the data URI. Too easy!

When is an optimization not an optimization?

When the cost is greater than the saving! And of course a process like this doesn’t come for free — there are several potential costs we have to consider.

Images as data URIs are one-third larger than their original. Such images also won’t cache — or at least, they won’t cache as images, but they will cache as part of the source-code. Caching in that way is more of a broad-brush than a fine-point, but it does at least allow for offline viewing.

There’s also the processing-overhead of doing each conversion in the first place, more so as the file gets larger. There’s no significant latency involved in loading it, as long as it’s on the same server, but base64 encoding is a comparatively expensive process.

So perhaps the optimum way to use this technique would be, not to use it for all images, but only to use it for large numbers of small images, like icons and background-slices. The additional code-size would be trivial, and the processing light, but the benefit of removing several-dozen requests could be great, creating a faster-loading page overall.

So let’s say for example that you only wanted to process GIF images, that’s easily done just by modifying the allowed file-types regex:

if(!preg_match('/^(gif)$/i', $filetype))
{
   return $matches[0];
}

Alternatively, you could use the filesize function to filter by size, and only proceed to conversion for those below a certain threshold:

if(filesize($matches[2]) > 1024)
{
   return $matches[0];
}

As far as large images go, I have read that browsers place limits on the size of data URIs; however I haven’t observed any such limitations in practice. Firefox, Opera, Safari, even IE8 were all perfectly happy displaying image-data more than 1MB in size. Ramping-up the tests, I found myself hitting PHP’s memory limit without garnering any complaints from the browsers! Either I’m missing the point entirely, or there are no size limits.

Westward Ho!

While experimenting, I did try with JavaScript and CSS too; however that didn’t work in Internet Explorer, so I didn’t pursue it any further.

But a really interesting development from here, would be to see if it were possible to develop some kind of algorithm that calculates the cost vs. benefit of implementing this technique in different situations. Taking into account perhaps the size and complexity of the page itself, the number and size of each image, the ratio of repeating CSS images to ad-hoc content images, and the time it takes to convert and encode, compared with an average network request. Then bringing all that together to work out which images would benefit from conversion, and which are best left as they are. If we could do that, in a coherent yet automagical way, we’d have a pretty nifty WordPress plugin, hey!

But to be honest, I really don’t know where you’d start to work out something like that! There are several unquantifiables, and many judgement calls. It’s certainly something to think about though; and perhaps you — my dear reader — can offer some fresh insight?

Thumbnail credit: Stéfan

Frequently Asked Questions on Reducing HTTP Requests with Generated Data URIs

What are HTTP requests and why are they important?

HTTP requests are the means by which a web browser asks for a specific resource, such as an image or a script, from a web server. They are crucial for loading web pages. However, each request takes time and can slow down your website if there are too many. Therefore, reducing HTTP requests can significantly improve your site’s speed and performance.

How can I reduce HTTP requests on my website?

There are several ways to reduce HTTP requests. One effective method is by using generated Data URIs. These are strings of data that can replace file references in your HTML, CSS, or JavaScript. By doing so, you eliminate the need for separate HTTP requests to fetch those files. Other methods include combining files, using CSS sprites, and eliminating unnecessary resources.

What are Data URIs and how do they work?

Data URIs are a type of URL scheme that allows you to embed data directly into web documents. They can contain images, fonts, and other types of data. Instead of making a separate HTTP request to fetch a file, the browser can read the data directly from the Data URI, reducing the total number of requests.

How can I generate Data URIs?

There are several online tools available that can convert files into Data URIs. These tools take your file as input and output a Data URI that you can use in your web documents. Some popular tools include dopiaza’s Data URI generator and SitePoint’s own generator.

Are there any downsides to using Data URIs?

While Data URIs can reduce HTTP requests, they can also increase the size of your web documents because they embed data directly into the document. This can potentially slow down your site if used excessively. Therefore, it’s important to use them judiciously and test your site’s performance regularly.

Can I use Data URIs for all types of files?

Data URIs can be used for many types of files, including images, fonts, and scripts. However, they are not suitable for large files because they can significantly increase the size of your web documents. They are best used for small, frequently used files.

How can I combine files to reduce HTTP requests?

Combining files is another effective method to reduce HTTP requests. This involves merging multiple CSS or JavaScript files into one. This way, the browser only needs to make one HTTP request instead of several. There are various online tools and plugins available that can automate this process.

What are CSS sprites and how can they reduce HTTP requests?

CSS sprites are a technique where multiple images are combined into one single image. This single image is then used throughout the website with CSS to display only the required portion of the image. This reduces the number of HTTP requests as the browser only needs to load one image instead of many.

How can I eliminate unnecessary resources to reduce HTTP requests?

Eliminating unnecessary resources involves removing any files or scripts that are not essential to your website. This can include unused images, fonts, or scripts. By doing so, you reduce the number of HTTP requests your site needs to make, improving its speed and performance.

How can I monitor the number of HTTP requests my website is making?

There are several online tools available that can analyze your website and provide a detailed report on the number of HTTP requests it’s making. These tools can help you identify areas where you can reduce requests and improve your site’s performance. Some popular tools include Google’s PageSpeed Insights and GTmetrix.