Introduction
It's impossible to remember how many times
I've said this on the phone to a client: "Can't see it? Try
refreshing the page. Hit Ctrl+Shift+R. Still no change?"
Cache control problems can be frustrating and confusing. If the above
conversation sounds familiar, you need a cache control strategy for
whenever you update your client's web site.
In this issue, Zach Johnson takes a fresh look at this caching conundrum
and explains a cache control technique that's easy to implement. It could
just be the answer you're looking for!

Create mindblowingly
beautiful web forms easily with our latest release, Fancy Form
Design. Grab your free sample now
Summary
Overcome your Caching Conundrums
by Zach Johnson
Caching is a balancing act. On the one hand, you aim to minimize load
times and bandwidth use by ensuring that images, scripts, and style sheets
are cached by your visitors; however, you still want to ensure that
they're accessing the most recent versions of all your files.
In this article, I'll show you a few methods for controlling how your
site's files are cached by browsers so you can achieve the best of both
worlds: maintaining optimal performance while ensuring that any updates
are seen immediately, without a hitch by all of your users.
The Source of the Problem
Let's say you're including a style sheet in your page, with a
link tag such as this:
<link rel="stylesheet"
href="/css/styles.css"
type="text/css" />
Browsers will fetch the file /css/styles.css once and then
cache that file. This is usually an ideal behavior. Your site will load
faster for visitors who have cached your style sheet, so your server will
handle fewer requests and consume less bandwidth.
By default, cached files expire relatively quickly. This means that
while you or your team might experience a problem with a cached style
sheet immediately upon making a change, infrequent site visitors (say once
every few days) will probably be oblivious to any issues with cached
content. However, it also means that visitors will often be downloading
files unnecessarily, leading to longer load times and wasted bandwidth.
In the interest of improving performance (and perhaps prompted by YSlow, a great tool for
benchmarking your site's performance), you'll want to tell browsers to
cache files from your site for much longer periods of time. You can find
out more about how to do this in this
article by Wayne Shea, where he advocates configuring your server to
attach "Expires" headers to your content, telling browsers to
keep those files in cache for weeks or months.
But having done this, your site's visitors may experience problems when
you make changes to images, style sheets, or scripts. You might think this
would be fairly harmless—simply presenting an out-of-date version of
the site to some visitors seems like a relatively minor problem. However,
if some elements (such as the HTML) are updated while others (such as
JavaScript files) are loaded from the cache, elements of functionality
might appear broken to your users.
The Solution
There are a few ways of working around this problem, and I'll go over
them in order of complexity. They all involve some variation on the same
theme: tricking the browser into thinking it's downloading a different
file than the one it has cached.
Modify the Filenames
The simplest workaround is to basically rename your style sheets,
images, and JavaScript files every time you update them. So, for example,
you could include a version number in the names of your files, like this:
<link rel="stylesheet"
href="/css/styles.1234.css"
type="text/css" />
While this will work well enough, it can become tedious very quickly if
you're changing your files often or have a large number of static
resources. It does, however, have the advantage of doubling as rudimentary
source control—you'll have a history of all the changes you've made
to your site laid out in your file system.
Query Strings
Another popular method is to add a query string to the end of your
filenames, for example:
<link rel="stylesheet"
href="/css/styles.css?v=1234"
type="text/css" />
With this method, every time you change the version number in the query
string, you'll force browsers to grab a new version of your style sheet,
image, or JavaScript file. The advantage of this method is that there's no
requirement to rename the files themselves; you only need to change the
references in your HTML.
The query string method does, however, have a few caveats. First, there
may be situations where it could conflict with another legitimate need for
a query string after your filenames (though this is unlikely for static
resources). The other and more significant drawback is that if you modify
images used in your CSS (as background images, for example), you'll also
need to edit your style sheet to append a query string to each of the
image references. Depending on the size of your style sheets, the number
of images they refer to, and the frequency with which you modify those
images, this can turn into a lot of work. After the jump, we'll
investigate a time-saving technique you can use to avoid all that.
Stop wasting hours trying to fix CSS code using
outdated answers you find via Google. Our 101 tested and proven
solutions in The CSS Anthology are all you
need. Grab four free sample
chapters
The Path Method
The final method -- and the one I want to focus on -- involves including
the version information in the path
of the resource, rather than in the filename or query string. So, for
example:
<link rel="stylesheet"
href="/css.1234/styles.css"
type="text/css" />
You probably think that this will involve just as much work as renaming
the files themselves. While this is theoretically true, in practice
there's a clever workaround to save you having to do that.
What we'll do is use a rewrite rule to make the URL
/css.1234/styles.css (and any other numerical variation of
that path) point to /css/styles.css on the server. To do this
with Apache, you'll need to use mod_rewrite. If you have
mod_rewrite installed and permissions for
.htaccess files, setting up a rule for this is
straightforward. You can add the following lines to the
.htaccess file located in your site's root web folder:
RewriteEngine On
RewriteRule css[.][0-9]+/(.*)$ css/$1 [L]
This rule uses a regular expression to match any path consisting of
css, followed by a period (.), followed by any
number of numerical characters, and finally a slash. Any path matching
this pattern will be rewritten to just /css/<filename>.
The [L] (meaning "last") specifies that no further
rewrite rules should be applied to this request. If the directory
containing your style sheets has a different name than css,
you'll need to adjust the rule accordingly, replacing both instances of
css with the name of your directory.
Images
What about images? You could
manually apply any of the methods listed above to every image URL on your
site; this includes all url() statements in your style sheets
and every src attribute for every img tag.
That would be a lot of work, though. Fortunately for us, it turns out to
be unnecessary. If you're using the path method to handle caching of your
CSS files, you can automatically reap the benefits of cache control for
the images you load in your style sheets using url(), without
any extra effort.
This technique relies on the oft bemoaned fact that the CSS
url() property is relative to the style sheet's location, rather than the
location of the styled document.
In order for this method to work, images used in your style sheets will
have to be in a subfolder of your style sheet's location (for example,
/css/images/ rather than just /images/). This
makes sense anyway, as the resources used by your style sheets are
logically grouped with the style sheets themselves.
How does it work? In your style sheet, you might have a rule like this:
#foo
{
background-image:
url(images/foo-background.png);
}
Following the path method described earlier, you would include your
style sheet in your document with markup like this:
<link rel="stylesheet"
href="/css.1234/styles.css"
type="text/css" />
Because the images in a style sheet are relative to the path of the
style sheet, the path to the image you've used is now
/css.1234/images/foo-background.png! This URL will be
rewritten by the RewriteRule, just like the URL to the CSS
file itself.
Any time you change the CSS in your style sheet -- or the images it
references -- you can simply change this version number, and all your
site's visitors will retrieve the correct styles and images. As well as
being a lot less work than changing each image reference individually,
you'll also avoid forgetting any images, as it's all done automatically.
JavaScript
I've yet to explicitly mention JavaScript files, but of course any
method used for CSS can also be used for JavaScript. So, for the path
method, you'd have a script tag like:
<script
src="/js.1234/scripts.js"
type="text/javascript"></script>
and a rewrite rule such as:
RewriteRule js[.][0-9]+/(.*)$ js/$1 [L]
These instructions assume that your .js files are located
in the /js/ directory, so you may need to adjust them to
match up with the directory structure of your site.
Conclusion
Only the smallest sites can afford to ignore browser caching. When you
take account of caching and engineer your site to take advantage of it,
you can significantly improve performance for your users, and save money
on bandwidth while you're at it. But like any technology, it's easy to use
incorrectly, quickly causing more headaches than it's worth. Using the
methods I've described, you should be able to gain the most out of browser
caching with as little effort as possible.
It's also important to remember that caching is only one part of a site
optimization strategy; download and use YSlow on all your sites to
make sure you're giving your visitors the fastest possible experience.
Good luck, and happy optimizing!
Post your comments on the article:

Jump-start your online presence this
winter with 1&1. Get 3 Months FREE Web
Hosting - Pay nothing for 3 full
months
- Includes FREE domain names, site builder, traffic-boosting
tools, and more.
- Over $400 in free marketing money to promote your
web site
- Over 9 million customers trust 1&1!
Start Building Your Web Site
Now!
See you next week for another issue of the Tech Times!
Andrew Tetlaw techtimes@sitepoint.com Technical Editor,
SitePoint
|