Reducing HTTP requests with generated data URIs

I’m not really a server-side guy, but I do dabble here and there, and the other day I had this really neat idea I’d like to share. Of course it might be old-hat to you experienced PHP programmers! But then I hope you’ll be interested in my implementation, and who knows — maybe I’m about to make somebody’s day!

The idea is this: you can reduce the number of HTTP requests that a page has to make for its images, by pre-processing the source-code and converting them to data URIs. In fact, as long as the total amount of data involved doesn’t threaten PHP’s memory limit, you can reduce the number to zero!

The data URI scheme is a means of including data in web-pages as though it were an external resource. It can be used for any kind of data, including images, scripts and stylesheets, and is supported in all modern browsers: Gecko browsers like Firefox and Camino; Webkit browsers like Safari, Konqueror and Chrome; Opera, of course; and IE8 in a limited fashion (but not IE7 or earlier).

As Google soon atested though, I’m not the first to have had the idea of using them for page-optimization. But the implementations I saw all revolved around re-writing image paths manually, to point them to scripting, something like this:


<img src="<?php echo data_uri('images/darwinfish.png'); ?>" alt="Darwin Fish" />
   

What I’m proposing is a retrospective process that converts all the image paths for you, so you don’t have to do anything special when you’re authoring the page in the first place.

Code is where the heart is

The following example is a complete demo page, with original HTML and CSS, surrounded by PHP.

The page contains five <img> elements and one CSS background-image, yet in supported browsers it makes no additional HTTP requests at all:

<?php 
if($datauri_supported = preg_match("/(Opera|Gecko|MSIE 8)/", $_SERVER['HTTP_USER_AGENT'])) 
{ 
   ob_start(); 
}
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
   
   <title>Data URI Generator</title>
   <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
   
   <style type="text/css">
   
      body
      {
         background:url(images/texture.jpeg) #e2e2dc repeat;
         color:#554;
      }   
      
   </style>
   
</head>
<body>
   
   <p>
      <img src="images/dropcap.jpg" alt="dropcap.jpg" />
      <img src="images/firefox.png" alt="firefox.png" />
      <img src='images/specificity.jpg' alt='specificity.jpg' />
      <img src='images/darwinfish.png' alt='darwinfish.png' />
      <img src="images/rolleyes.gif" alt="rolleyes.gif" />
   </p>
   
</body>
</html>
<?php
   
if($datauri_supported)
{
   function create_data_uri($matches)
   {
      $filetype = explode('.', $matches[2]);
      $filetype = strtolower($filetype[count($filetype) - 1]);
      
      if(!preg_match('/^(gif|png|jp[e]?g|bmp)$/i', $filetype))
      {
         return $matches[0];
      }
      
      if(preg_match('/^//', $matches[2]))
      {
         $matches[2] = $_SERVER['DOCUMENT_ROOT'] . $matches[2];
      }
   
      @$data = base64_encode(file_get_contents($matches[2]));
   
      return $matches[1] . "data:image/$filetype;base64,$data" . $matches[3];
   }
   
   
   $html = ob_get_contents();
   ob_end_clean();
   
   $html = preg_split("/r?n|r/", $html);
   while(count($html) > 0)
   {
      $html[0] = preg_replace_callback("/(src=["'])([^"']+)(["'])/", 'create_data_uri', $html[0]);
      $html[0] = preg_replace_callback("/(url(['"]?)([^"')]+)(["']?))/", 'create_data_uri', $html[0]);
   
      echo $html[0] . "rn";
   
      array_shift($html);
   }
}
   
?>

How this all works

The heart of all this is the ability to build data URIs using base64-encoded image data.

But over and above that, there’s a couple of key tricks we need to make this all work. Firstly, using the output buffer to pre-compile the output source, so we have a chance to parse it again before sending it to the browser.

You’ll see how, at the very top of the code, I’ve set a browser condition to decide whether to start the output buffer. That same condition is then used again, surrounding the main code at the bottom, so for browsers that don’t support this technique, all the scripting is bypassed and it just outputs the page as normal. Then what I’ve done is split the HTML by its line-breaks, so we can process, output then delete each line immediately, which avoids having to hold the entire source-code in memory.

Secondly, to implement the actual parsing I’ve used the preg_replace_callback function, which identifies HTML and CSS paths with a pair of regular-expressions, and passes them through a process too complex for a simple replacement. (We have to look for src attributes and url properties separately, because the syntax is too different for a single regex to generate identical match arrays.)

Within the callback function we first have to work out the file-type, which is needed for the output data, and also as a condition for allowed types, so we can reject anything that’s not an image (such as a script src). The $matches arrays that’s passed to the function always contains the entire substring match as its first member (followed by backreferences from [1]), so if we identify a file we don’t want we can just return that first match unmodified, and we’re done.

The only other thing to do after that is check for web-root paths, that will need prepending with DOCUMENT_ROOT to create a usable file path. Once we have all that, we can grab and encode the image (with error-suppression in case the original path was broken), then compile and return the data URI. Too easy!

When is an optimization not an optimization?

When the cost is greater than the saving! And of course a process like this doesn’t come for free — there are several potential costs we have to consider.

Images as data URIs are one-third larger than their original. Such images also won’t cache — or at least, they won’t cache as images, but they will cache as part of the source-code. Caching in that way is more of a broad-brush than a fine-point, but it does at least allow for offline viewing.

There’s also the processing-overhead of doing each conversion in the first place, more so as the file gets larger. There’s no significant latency involved in loading it, as long as it’s on the same server, but base64 encoding is a comparatively expensive process.

So perhaps the optimum way to use this technique would be, not to use it for all images, but only to use it for large numbers of small images, like icons and background-slices. The additional code-size would be trivial, and the processing light, but the benefit of removing several-dozen requests could be great, creating a faster-loading page overall.

So let’s say for example that you only wanted to process GIF images, that’s easily done just by modifying the allowed file-types regex:

if(!preg_match('/^(gif)$/i', $filetype))
{
   return $matches[0];
}

Alternatively, you could use the filesize function to filter by size, and only proceed to conversion for those below a certain threshold:

if(filesize($matches[2]) > 1024)
{
   return $matches[0];
}

As far as large images go, I have read that browsers place limits on the size of data URIs; however I haven’t observed any such limitations in practice. Firefox, Opera, Safari, even IE8 were all perfectly happy displaying image-data more than 1MB in size. Ramping-up the tests, I found myself hitting PHP’s memory limit without garnering any complaints from the browsers! Either I’m missing the point entirely, or there are no size limits.

Westward Ho!

While experimenting, I did try with JavaScript and CSS too; however that didn’t work in Internet Explorer, so I didn’t pursue it any further.

But a really interesting development from here, would be to see if it were possible to develop some kind of algorithm that calculates the cost vs. benefit of implementing this technique in different situations. Taking into account perhaps the size and complexity of the page itself, the number and size of each image, the ratio of repeating CSS images to ad-hoc content images, and the time it takes to convert and encode, compared with an average network request. Then bringing all that together to work out which images would benefit from conversion, and which are best left as they are. If we could do that, in a coherent yet automagical way, we’d have a pretty nifty WordPress plugin, hey!

But to be honest, I really don’t know where you’d start to work out something like that! There are several unquantifiables, and many judgement calls. It’s certainly something to think about though; and perhaps you — my dear reader — can offer some fresh insight?

Thumbnail credit: Stéfan

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • http://www.tyssendesign.com.au Tyssen

    One other thing I’d like to know is how this technique would impact on SEO, because in your demo page you’ve got masses of code to very little actual content. I don’t know if it’s still current but advice used to be that search engines stop reading pages at a certain point so try to get all your important content towards the top of the page. Using this technique would definitely have an impact on that.

  • Nicolas

    What about browser caching for an image used in multiple pages? Saving HTTP requests at the cost of using more bandwidth for each request: benefit is tough to assess. If any!

  • http://www.deanclatworthy.com Dean C

    What you’ve done here is interesting, but highly impractical for any large website. If the fact that the image is 1/3 larger isn’t enough to make you avoid it, then the fact that images aren’t cached would be enough to make me avoid ever using this technique ;)

  • Jake Noble

    If this is perhaps only useful for “large numbers of small images” then surely using CSS backgrounds with positioning is faster and it only creates one HTTP request.

    Still a really nice idea though!

  • http://logicearth.wordpress.com logic_earth

    For IE, even version 8 I wouldn’t use Data URIs. IE supports a much better way of joining multiple files into a single request. MHTML, which is “Multipurpose Internet Mail Extensions HyperText Markup Language” for those that didn’t know. Anyways it works almost like email’s multiple content.

  • Ulyses

    I think that if it was really a solution (not your’s specifically, but the concept by itself), we would have a mod_dataURI by now. It’s probably not.

  • http://www.lunadesign.org awasson

    That’s pretty nifty… I’m not sure about WP but I’ll bet it would be pretty easy to add this to a Drupal theme. I’ll have to give this a go.

    The question I guess is, is there a measurable performance hit or benefit in loading image data using the URI method?

  • http://pixopoint.com/ ryanhellyer

    LMAO. I thought this was a joke of some sort when I read the first paragraph, but it seems you have indeed managed to load a bunch of images onto a page with zero http requests!

    Big thumbs up :)

    Not sure I’ll ever use this technique though, but it’s nice to know that it is possible to have such fine-grained control over things like this when needed.

    On a related note:
    I’ve used multiple DIV tags to fake things like smooth dropshadows previously to reduce http requests for image based dropshadows and it works pretty well. Not only does it load faster and use zero http requests, but depending on the style and size of dropshadow, the total byte size downloaded can also be smaller.

  • http://pixopoint.com/ ryanhellyer

    Annoyingly, it doesn’t work well with Tiny URLs … http://tinyurl.com/2fdxkas

  • 2MHost.com

    Nice to read, but if IE does not support this method, then its not usable .. its the ugly truth.

    • ScallioXTX

      That’s why it says in the first line:
      if($datauri_supported = preg_match(“/(Opera|Gecko|MSIE 8)/”, $_SERVER['HTTP_USER_AGENT']))
      i.e., it will only rewrite for IE 8 and up. IE 7 and below will be served the images the normal way.

  • Tim

    How does it affect the timing of the domready event and how long the page takes to download before rendering starts? With external images, yes you have more http requests, but you can start rendering before they start downloading can’t you?

  • Michael

    I still think having a CSS sprite is a much better approach

  • Anonymous

    I like the idea but not good to use in website. We need multi browser support technique. Thanks for the idea.

  • http://codefisher.org/ codefisher

    I would consider doing this in CSS, because then you can put in both the normal image, and the data url after it. Good browsers will load the data url and ignore the previous definition. To offset the extra size you can actually set the web server to compress the CSS file before it is sent. You would also want to save the output from something like this, so it does not have to be done on every request.

    I would however never do this in HTML, it just feels all wrong, along with all the reasons given by others who have posted before me.

    Michael

  • ndluthier

    I agree with codefisher since CSS files can be highly cacheable and you are providing graceful degradation for browsers that don’t support data URIs. It has the benefits of being a single request, savable offline, cached, and cross-browser.

  • http://www.brothercake.com/ brothercake

    Yes I tend to agree – on balance, the cachability of CSS files makes them an attractive proposition for this kind of optimization.

    It would also help with the issue that Tyssen mentioned – I think there probably would be an SEO impact, if spiders are abandoning pages after Xk, or making any kind of signal-noise assessment.

    So the next things to investigate would be pre-processing a CSS file in that way, and then doing some benchmarks to get typical figures on process time vs. normal request time, and try and get a sense of where the crossover points of efficiency are.

    RE: browser support – as others have mentioned, there’s no browser support issue here, because of the condition at the top. So browsers that don’t support data URIs just load the images as normal – then it’s like a progressive enhancement for those that do.

    I’m also a bit puzzled about the size limit thing. In testing, I was able to load images up to 1MB in size this way and it worked fine; I thought that included IE8, but now I come to test it again it does seem to place very low limits on the size. I guess the response to that would be to stop considering IE8 a supported browser, and just let it degrade to normal with the rest!