Overcome Your Caching Conundrums

Dealing with browser caching is a balancing act. On one hand, you aim to minimize load times and bandwidth use by ensuring that images, scripts, and style sheets are cached by your visitors; however, you still want to ensure that they’re accessing the most recent versions of all your files.

In this article, I’ll show you a few methods for controlling how your site’s files are cached by browsers so you can achieve the best of both worlds: maintaining optimal performance while ensuring that any updates are seen immediately, without a hitch by all of your users.

The Source of the Problem

Let’s say you’re including a style sheet in your page, with a link tag such as this:

<link rel="stylesheet" href="/css/styles.css" type="text/css" />

Browsers will fetch the file /css/styles.css once and then cache that file. This is usually an ideal behavior. Your site will load faster for visitors who have cached your style sheet, so your server will handle fewer requests and consume less bandwidth.

By default, cached files expire relatively quickly. This means that while you or your team might experience a problem with a cached style sheet immediately upon making a change, infrequent site visitors (say once every few days) will probably be oblivious to any issues with cached content. However, it also means that visitors will often be downloading files unnecessarily, leading to longer load times and wasted bandwidth.

In the interest of improving performance (and perhaps prompted by YSlow, a great tool for benchmarking your site’s performance), you’ll want to tell browsers to cache files from your site for much longer periods of time. You can find out more about how to do this in this article by Wayne Shea, where he advocates configuring your server to attach “Expires” headers to your content, telling browsers to keep those files in cache for weeks or months.

But having done this, your site’s visitors may experience problems when you make changes to images, style sheets, or scripts. You might think this would be fairly harmless—simply presenting an out-of-date version of the site to some visitors seems like a relatively minor problem. However, if some elements (such as the HTML) are updated while others (such as JavaScript files) are loaded from the cache, elements of functionality might appear broken to your users.

The Solution

There are a few ways of working around this problem, and I’ll go over them in order of complexity. They all involve some variation on the same theme: tricking the browser into thinking it’s downloading a different file than the one it has cached.

Modify the Filenames

The simplest workaround is to basically rename your style sheets, images, and JavaScript files every time you update them. So, for example, you could include a version number in the names of your files, like this:

<link rel="stylesheet" href="/css/styles.1234.css" type="text/css" />

While this will work well enough, it can become tedious very quickly if you’re changing your files often or have a large number of static resources. It does, however, have the advantage of doubling as rudimentary source control—you’ll have a history of all the changes you’ve made to your site laid out in your file system.

Query Strings

Another popular method is to add a query string to the end of your filenames, for example:

<link rel="stylesheet" href="/css/styles.css?v=1234" type="text/css" />

With this method, every time you change the version number in the query string, you’ll force browsers to grab a new version of your style sheet, image, or JavaScript file. The advantage of this method is that there’s no requirement to rename the files themselves; you only need to change the references in your HTML.

The query string method does, however, have a few caveats. First, there may be situations where it could conflict with another legitimate need for a query string after your filenames (though this is unlikely for static resources). The other and more significant drawback is that if you modify images used in your CSS (as background images, for example), you’ll also need to edit your style sheet to append a query string to each of the image references. Depending on the size of your style sheets, the number of images they refer to, and the frequency with which you modify those images, this can turn into a lot of work.

The Path Method

The final method—and the one I want to focus on—involves including the version information in the path of the resource, rather than in the filename or query string. So, for example:

<link rel="stylesheet" href="/css.1234/styles.css" type="text/css" />

You probably think that this will involve just as much work as renaming the files themselves. While this is theoretically true, in practice there’s a clever workaround to save you having to do that.

What we’ll do is use a rewrite rule to make the URL /css.1234/styles.css (and any other numerical variation of that path) point to /css/styles.css on the server. To do this with Apache, you’ll need to use mod_rewrite. If you have mod_rewrite installed and permissions for .htaccess files, setting up a rule for this is straightforward. You can add the following lines to the .htaccess file located in your site’s root web folder:

RewriteEngine OnRewriteRule css[.][0-9]+/(.*)$ css/$1 [L]

This rule uses a regular expression to match any path consisting of css, followed by a period (.), followed by any number of numerical characters, and finally a slash. Any path matching this pattern will be rewritten to just /css/<filename>. The [L] (meaning “last”) specifies that no further rewrite rules should be applied to this request. If the directory containing your style sheets has a different name than css, you’ll need to adjust the rule accordingly, replacing both instances of css with the name of your directory.

Images

What about images? You could manually apply any of the methods listed above to every image URL on your site; this includes all url() statements in your style sheets and every src attribute for every img tag.

That would be a lot of work, though. Fortunately for us, it turns out to be unnecessary. If you’re using the path method to handle caching of your CSS files, you can automatically reap the benefits of cache control for the images you load in your style sheets using url(), without any extra effort.

This technique relies on the oft bemoaned fact that the CSS url() property is relative to the style sheet’s location, rather than the location of the styled document.

In order for this method to work, images used in your style sheets will have to be in a subfolder of your style sheet’s location (for example, /css/images/ rather than just /images/). This makes sense anyway, as the resources used by your style sheets are logically grouped with the style sheets themselves.

How does it work? In your style sheet, you might have a rule like this:

#foo{    background-image: url(images/foo-background.png);}

Following the path method described earlier, you would include your style sheet in your document with markup like this:

<link rel="stylesheet" href="/css.1234/styles.css" type="text/css" />

Because the images in a style sheet are relative to the path of the style sheet, the path to the image you’ve used is now /css.1234/images/foo-background.png! This URL will be rewritten by the RewriteRule, just like the URL to the CSS file itself.

Any time you change the CSS in your style sheet—or the images it references—you can simply change this version number, and all your site’s visitors will retrieve the correct styles and images. As well as being a lot less work than changing each image reference individually, you’ll also avoid forgetting any images, as it’s all done automatically.

JavaScript

I’ve yet to explicitly mention JavaScript files, but of course any method used for CSS can also be used for JavaScript. So, for the path method, you’d have a script tag like:

<script src="/js.1234/scripts.js" type="text/javascript"></script>

and a rewrite rule such as:

RewriteRule js[.][0-9]+/(.*)$ js/$1 [L]

These instructions assume that your .js files are located in the /js/ directory, so you may need to adjust them to match up with the directory structure of your site.

Conclusion

Only the smallest sites can afford to ignore browser caching. When you take account of caching and engineer your site to take advantage of it, you can significantly improve performance for your users, and save money on bandwidth while you’re at it. But like any technology, it’s easy to use incorrectly, quickly causing more headaches than it’s worth. Using the methods I’ve described, you should be able to gain the most out of browser caching with as little effort as possible.

It’s also important to remember that caching is only one part of a site optimization strategy; download and use YSlow on all your sites to make sure you’re giving your visitors the fastest possible experience. Good luck, and happy optimizing!

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

No Reader comments

Comments on this post are closed.