How to Solve Caching Conundrums

The web wouldn’t operate without caching. Between you and the server, there is a browser and any number of proxy servers which cache responses. Much of this is handled transparently by applications which dramatically reduce Internet traffic. However, it can also be the cause of bizarre web application quirkiness if you’re not very careful …

Key Takeaways

Caching is a critical aspect of web functioning, controlled by the HTTP status code and the Last-Modified, Etag and Cache-Control headers returned by every request. The Cache-Control header primarily determines caching action, with options such as no-store or no-cache, public or private, and max-age.
To avoid caching issues, especially with Chrome and Edge, it’s crucial to differentiate page and Ajax Data URLs. The Ajax call should use a different URL to ensure both HTML and JSON requests are cached separately, preventing JSON from being presented when it shouldn’t be.
Developers should be aware of complications arising from self-signed SSL certificates. Chrome, and most Blink-based browsers, refuse to cache page data when a fake certificate is encountered, leading to potential inconsistencies between test sites and live servers.

Set Your Headers

headers

Caching is controlled by the HTTP status code and the Last-Modified, Etag and Cache-Control headers returned by every request. For subsequent requests to the same URL, the browser/proxy will either:

retrieve the previous data from its own cache
ask the server to verify whether the data has changed, or
make a fresh request.

The Cache-Control header primarily determines this action. It can set up to three comma-separated values:

no-store or no-cache
no-store stops the browser and all proxy servers caching the returned data. Every request will therefore incur a trip back to the server.

The alternative is no-cache. The browser/proxy will make a server request and pass back Last-Modified (date/time) and/or an Etag (response hash/checksum) in the header. These are present on subsequent requests and, if the response has not changed, the server returns a 304 Not Modified status, which instructs the browser/proxy to use its own cached data. Otherwise, the new data is passed back with a 200 OK status.

public or private
Setting Cache-Control to public means the response is the same for everyone and the data can be cached in browser or proxy stores. It’s the default behavior, so it’s not necessary to set it.

private responses are intended for a single user. For example, the URL https://myapp.com/messages returns a set of messages unique to each logged-in user, even though both of them use the same URL. Therefore, the browser can cache the response, but proxy server caching is not permitted.

max-age
This specifies the maximum time in seconds a response remains valid. For example, max-age=60 indicates the browser/proxy can cache the data for one minute before making a new request.

Your server, language and framework often control these settings, so you rarely need to tinker — but you can. Presume you wanted to cache an individual user’s JSON response to an Ajax request for 30 seconds. In PHP:



header('Cache-Control: private,max-age=30');

echo json_encode($data);

or a Node.js/Express router:



res

    .set('Cache-Control', 'private,max-age=30')

    .json(data);

Differentiate Page and Ajax Data URLs

Setting HTTP headers may not be enough, because browsers work in slightly different ways when you hit the back button.

In Firefox and Safari, hitting back will attempt to show the previous page in its last known state — presuming the URL has been changed with an updated #hash, or by intercepting actions with history API events.

In Chrome and Edge, hitting back shows the previous page in its initial starting state — although your JavaScript will initialize and perhaps change the DOM if necessary.

In practice, it rarely matters which browser you’re using, but there are some weird edge cases. Presume your application presents a paginated table of records, which the user can search and click page navigation buttons. We’re good developers, so we’ll use progressive enhancement to ensure the system works in all browsers:

The user enters the page at http://myapp.com/list/
Submitting the form to change filters or navigate to a new page will change the URL and make a new request — for example, http://myapp.com/list/?search=bob&page=42. The system works in any browser with or without JavaScript.
We introduce JavaScript enhancements so a full-page refresh isn’t required. The code intercepts the form submit and history back/next events so, while the URL still changes, the application performs an Ajax request in the background.
The Ajax request calls the same URL such as http://myapp.com/list/?search=bob&page=42 — but sets the X-Requested-With HTTP header to XMLHttpRequest (done by jQuery and all good Ajax libraries). The server recognizes this header — so, instead of returning the full page HTML, it returns JSON-encoded record data. Our JavaScript uses this to update the DOM.

In summary, our server can either return HTML or JSON for the same URL, depending on the state of the request’s X-Requested-With header. Unfortunately, this can cause a problem with Chrome and Edge, because either HTML or JSON could be cached.

Presume you randomly navigate around the record list and, at http://myapp.com/list/?search=bob&page=42, you click a link to another (non-list) page, followed by the browser back button to return. Chrome looks at its cache, sees JSON data for that URL and presents it to the user! Hitting refresh will fix the problem because a request will be made without the X-Requested-With header. What’s more bizarre is that Firefox works as expected and restores the actual page’s state.

The fix: ensure your page and data URLs are never the same. When navigating to http://myapp.com/list/?search=bob&page=42, the Ajax call should use a different URL: it can be as simple as http://myapp.com/list/?search=bob&page=42&ajax=1. This ensures Chrome can cache both the HTML and JSON requests separately, but JSON is never presented, because the Ajax URL never appears within the browser address bar.

(Note: please don’t use this example as justification for avoiding progressive enhancement! It can affect a JavaScript-dependent Single-Page Application — especially those providing links to external URLs.)

Unfortunately, there’s a further complication …

Beware Self-signed SSL Certificates

secure

Ideally, your application is using the encrypted HTTPS protocol. However, there’s no need to purchase SSL certificates for all 57 members of your team, because you can use a fake, self-signed certificate and click proceed whenever a browser complains.

Be aware that Chrome (and presumably most Blink-based browsers) refuses to cache page data when a fake certificate is encountered. It’s similar to setting Cache-Control to no-store on every request.

Your test sites will work exactly as expected, and you’ll never experience the same page/data URL issues described above. The cache is never used, and all requests return to the server. The same application on a live server with a real SSL certificate will cache data. Your users may report seeing strange JSON responses from Chrome — which you won’t be able to reproduce locally.

These are the sorts of nightmare challenges that continue to plague web development! I hope you found this overview helpful. Feel free to share your own nightmares in the comments. We’re all in this together …

Frequently Asked Questions (FAQs) on Solving Caching Conundrums

What is caching and why is it important in web development?

Caching is a technique used in computing to store data for future use. It is important in web development because it significantly improves the efficiency and speed of data retrieval. When a user visits a website, the browser stores some data from the site. When the user revisits the site, the browser can quickly load the page using the stored data, reducing the load time and providing a better user experience.

How does caching work with JSON responses?

When a web application makes a request to a server for a JSON response, the server sends back the data along with HTTP headers. These headers can include caching directives, which tell the browser how long to store the data. If the same request is made again within the cache period, the browser can use the cached data instead of making another request to the server, saving time and resources.

What are the common issues with caching JSON responses?

One common issue is that the browser may not cache the JSON response if the server does not send the correct HTTP headers. Another issue is stale data. If the data on the server changes but the browser has a cached version, the user may see outdated information. To avoid this, developers need to implement strategies to invalidate or update the cache when the data changes.

How can I control caching behavior in my web application?

You can control caching behavior by setting the appropriate HTTP headers in your server responses. For example, the “Cache-Control” header can be used to specify how long the data should be cached. Other headers like “ETag” and “Last-Modified” can be used to validate the cache and ensure the data is up-to-date.

What is the Cache API and how can it be used with JSON responses?

The Cache API is a system that allows you to store and retrieve network requests and their responses. It can be used with JSON responses to cache them for future use. You can add a response to the cache using the cache.put() method and retrieve it using the cache.match() method.

How can I ensure my JSON responses are being cached correctly?

You can use browser developer tools to inspect the network traffic and check the HTTP headers of your server responses. If the headers are set correctly and the browser supports caching, you should see the responses being cached.

How can I invalidate or update the cache when my data changes?

There are several strategies to invalidate or update the cache. One common method is to use a “cache-busting” technique, where you change the URL of the resource whenever the data changes. This forces the browser to fetch the new data and update the cache.

Can I cache large JSON objects in the browser?

Yes, you can cache large JSON objects in the browser. However, keep in mind that each browser has a limit on the amount of data it can store. If you exceed this limit, the browser may start evicting older data from the cache.

What are the benefits of caching JSON responses in the browser?

Caching JSON responses in the browser can significantly improve the performance of your web application. It reduces the number of requests to the server, saving bandwidth and reducing server load. It also provides a faster user experience, as the browser can load cached data much quicker than it can fetch new data from the server.

Are there any security concerns with caching JSON responses?

Yes, there can be security concerns with caching sensitive data. If a user’s device is compromised, an attacker could potentially access the cached data. To mitigate this risk, you should avoid caching sensitive data, or use secure methods like HTTPS and encryption to protect the data.