There are many factors to consider when optimizing the download time of your web page. The one golden rule, however, is to minimize the number of HTTP network requests made by the browser. Those HTTP requests (and responses) are resource intensive, so the less used by your web page, the faster your page will load.
In this article, we’ll look at how you can take advantage of the browser cache to lessen the number of HTTP requests. We’ll also reduce your web site’s network bandwidth requirements by limiting the number of full responses sent by your web server. I’ll be honest – web page optimization can be a little dry. However, stick with me, and you’ll walk away with a solid understanding of client-side cacheing as well as a handful of practical tips to improve the performance of your web site. Let’s get started!
So what is the browser cache?
A web browser stores objects – for example, images, HTML documents, style sheets – downloaded over the network in a special area called the browser cache. The way the cache works is simple: when the user navigates to a page, the web browser will first check if the browser cache already contains the content for that page. If the content is still fresh in the cache, another download is unnecessary. Easy, right?
What may be news to you is that the HTTP/1.1 protocol – the communications protocol in common use on the Web – allows you to specify what content is cacheable, and for how long the downloaded content can be considered fresh by the browser cache. This information is specified in the response headers returned by the web server. Response headers are lines of text describing the page being sent (and the server that’s sending it). You can actually view this information if you’re using the Firefox browser and have the Live HTTP Headers extension installed.
The parts of the response header relating to cache control are, not surprisingly, called the cache control directives.
Right, so how do we specify something as cacheable?
Using either one of the following headers in the server response will tell the browser that the content is cacheable:
Cache-Control: max-age=specify a duration in seconds
Expires: a GMT date in the format specified by RFC 1123
Only one of these is needed, but if both headers are present in the server response for some inexplicable reason, the
Cache-Control header takes priority over the
- If you use the
Cache-Controlheader, the cache entry will be considered fresh until the duration that you specified (in seconds) has elapsed.
- If you use the
Expiresheader, on the other hand, the cache entry is considered to be fresh until the expired date arrives. The RFC 1123 standard specifies the following date time format:
Thu, 01 Jan 2008 13:37:41 GMT.
To specify an expiration time in the near future, it's better to use the
max-age directive in the
Cache-Control header, to avoid clock synchronization errors between the browser and the server. For expiration times far into the future, the
Expires header is a safer bet - it's more readable to humans and less error prone.
What happens when the cache entry expires?
If the browser requests an object that's in the browser cache but has expired, the object may still be valid. The browser can then check with the server again to see if the cached entry is still usable. It does this by including an
If-Modified-Since header in the request. This is called validating the cache entry. If the cached entry in the browser is still the same, it's unnecessary for the server to resend the unchanged item. If the server finds instead that the content has changed - that is, it's no longer valid - a full response or refreshed page is returned to the browser.
An example sequence of events for a conditional request is shown below:
- The browser navigates to a page and makes a HTTP request.
- The server returns a successful response with the
Cache-Controlheader and a
Last-Modifiedin the response.
- The browser stores the content in the cache.
- Time elapses; the browser navigates to the same page again and finds that the content has expired in its cache.
- The browser makes a conditional request to the server with an
If-Modified-Sincecache validator. The same GMT time from the
Last-Modifiedheader of the original response (point 2.) is used in the conditional request here.
- If the server finds that the content is the same, it responds with a status code 304, indicating that the content hasn't been modified. The browser can then reuse the cached item, saving on download energy.
There is a small overhead when using conditional requests: when using the
If-Modified-Since header in the browser's request, and the
Cache-Control directive with
Last-Modified header in the server's response. But when you compare that to sending the entire response, you're looking at a significant saving in network traffic.
When does a browser make a conditional request?
There are two conditions that need to be met before a browser will make a conditional request.
First of all, the resource should already be in the cache, even if it's possibly expired. Secondly, according to section 13.3 of the HTTP 1.1 specification, the server response for the resource has to have a cache validator, such as the
Last-Modified header. If these conditions are in place, then the browser may issue a conditional request.
What makes for the best performance?
Expires cache control headers with a date in the far future - not too far, though! According to the HTTP 1.1 protocol, an HTTP server should not send an
Expires header that's greater than one year into the future.
Of course, these static components might be updated from time to time - for example, if you make some CSS fixes or upload some new images. One way to make sure the browser fetches the updated resources straight away is to add the version number of the static components as if it were a GET variable. Simply put a version number at the end of the item's URL in the markup.
Here's an example. Suppose a document's HTML markup references version 1 of a style sheet, which we'll call
special.css. We could reference
special.css as follows in the HTML document:
<link href="special.css?v=1" rel="stylesheet" type="text/css" />
Later, when the design changes, the corresponding HTML markup could be:
<link href="special.css?v=2" rel="stylesheet" type="text/css" />
The browser will see the new GET variable, treat the file as if it were new, and download the fresh style sheet.
For content that changes frequently, we should ensure the browser makes that conditional request so that it can get the freshest content - but only when that content has changed. To make this work, we should include both the
Last Modified and
Cache-Control: max-age=0, must-revalidate headers in the server response. That way, the browser will always make a conditional request when the component is referenced in the HTML markup. If the component is unaltered, the server can return a response with a 304 Not Modified status to indicate that the content is unchanged, instead of sending the full response.
Put it into practice
By now, you should have a couple of tricks up your sleeve to guarantee that your visitors undergo a snappy experience while browsing your site, as well as saving some network traffic (and saving yourself some cash as a direct result). If you're curious about your site's performance, Yahoo's Firefox plugin YSlow assesses each of the components on a page, and gives it a grade from A to F on various optimization techniques.
And to get started with configuring your headers, check out Apache's mod_expires and IIS content expiration documentation for how to configure this on your favorite web server.
Now, go out there and save some bytes!