How to Solve Caching Conundrums
The web wouldn’t operate without caching. Between you and the server, there is a browser and any number of proxy servers which cache responses. Much of this is handled transparently by applications which dramatically reduce Internet traffic. However, it can also be the cause of bizarre web application quirkiness if you’re not very careful …
Set Your Headers
Caching is controlled by the HTTP status code and the Last-Modified
, Etag
and Cache-Control
headers returned by every request. For subsequent requests to the same URL, the browser/proxy will either:
- retrieve the previous data from its own cache
- ask the server to verify whether the data has changed, or
- make a fresh request.
The Cache-Control
header primarily determines this action. It can set up to three comma-separated values:
no-store or no-cache
no-store
stops the browser and all proxy servers caching the returned data. Every request will therefore incur a trip back to the server.
The alternative is no-cache
. The browser/proxy will make a server request and pass back Last-Modified
(date/time) and/or an Etag
(response hash/checksum) in the header. These are present on subsequent requests and, if the response has not changed, the server returns a 304 Not Modified
status, which instructs the browser/proxy to use its own cached data. Otherwise, the new data is passed back with a 200 OK
status.
public or private
Setting Cache-Control
to public
means the response is the same for everyone and the data can be cached in browser or proxy stores. It’s the default behavior, so it’s not necessary to set it.
private
responses are intended for a single user. For example, the URL https://myapp.com/messages
returns a set of messages unique to each logged-in user, even though both of them use the same URL. Therefore, the browser can cache the response, but proxy server caching is not permitted.
max-age
This specifies the maximum time in seconds a response remains valid. For example, max-age=60
indicates the browser/proxy can cache the data for one minute before making a new request.
Your server, language and framework often control these settings, so you rarely need to tinker — but you can. Presume you wanted to cache an individual user’s JSON response to an Ajax request for 30 seconds. In PHP:
header('Cache-Control: private,max-age=30');
echo json_encode($data);
or a Node.js/Express router:
res
.set('Cache-Control', 'private,max-age=30')
.json(data);
Differentiate Page and Ajax Data URLs
Setting HTTP headers may not be enough, because browsers work in slightly different ways when you hit the back button.
In Firefox and Safari, hitting back will attempt to show the previous page in its last known state — presuming the URL has been changed with an updated #hash
, or by intercepting actions with history API events.
In Chrome and Edge, hitting back shows the previous page in its initial starting state — although your JavaScript will initialize and perhaps change the DOM if necessary.
In practice, it rarely matters which browser you’re using, but there are some weird edge cases. Presume your application presents a paginated table of records, which the user can search and click page navigation buttons. We’re good developers, so we’ll use progressive enhancement to ensure the system works in all browsers:
- The user enters the page at
http://myapp.com/list/
- Submitting the form to change filters or navigate to a new page will change the URL and make a new request — for example,
http://myapp.com/list/?search=bob&page=42
. The system works in any browser with or without JavaScript. - We introduce JavaScript enhancements so a full-page refresh isn’t required. The code intercepts the form submit and history back/next events so, while the URL still changes, the application performs an Ajax request in the background.
- The Ajax request calls the same URL such as
http://myapp.com/list/?search=bob&page=42
— but sets theX-Requested-With
HTTP header toXMLHttpRequest
(done by jQuery and all good Ajax libraries). The server recognizes this header — so, instead of returning the full page HTML, it returns JSON-encoded record data. Our JavaScript uses this to update the DOM.
In summary, our server can either return HTML or JSON for the same URL, depending on the state of the request’s X-Requested-With
header. Unfortunately, this can cause a problem with Chrome and Edge, because either HTML or JSON could be cached.
Presume you randomly navigate around the record list and, at http://myapp.com/list/?search=bob&page=42
, you click a link to another (non-list) page, followed by the browser back button to return. Chrome looks at its cache, sees JSON data for that URL and presents it to the user! Hitting refresh will fix the problem because a request will be made without the X-Requested-With
header. What’s more bizarre is that Firefox works as expected and restores the actual page’s state.
The fix: ensure your page and data URLs are never the same. When navigating to http://myapp.com/list/?search=bob&page=42
, the Ajax call should use a different URL: it can be as simple as http://myapp.com/list/?search=bob&page=42&ajax=1
. This ensures Chrome can cache both the HTML and JSON requests separately, but JSON is never presented, because the Ajax URL never appears within the browser address bar.
(Note: please don’t use this example as justification for avoiding progressive enhancement! It can affect a JavaScript-dependent Single-Page Application — especially those providing links to external URLs.)
Unfortunately, there’s a further complication …
Beware Self-signed SSL Certificates
Ideally, your application is using the encrypted HTTPS protocol. However, there’s no need to purchase SSL certificates for all 57 members of your team, because you can use a fake, self-signed certificate and click proceed whenever a browser complains.
Be aware that Chrome (and presumably most Blink-based browsers) refuses to cache page data when a fake certificate is encountered. It’s similar to setting Cache-Control
to no-store
on every request.
Your test sites will work exactly as expected, and you’ll never experience the same page/data URL issues described above. The cache is never used, and all requests return to the server. The same application on a live server with a real SSL certificate will cache data. Your users may report seeing strange JSON responses from Chrome — which you won’t be able to reproduce locally.
These are the sorts of nightmare challenges that continue to plague web development! I hope you found this overview helpful. Feel free to share your own nightmares in the comments. We’re all in this together …