Page Caching with HTTP
If you’ve got the sort of application which benefits from full page caching then chances are you’ve already had a look at rails style page caching. For the unfamiliar, page caching is useful when you’ve got an action in your app which is not unique per visitor and so can be saved and reused in its entirety next time a request comes your way.
Page caching in Rails writes out the result of your action to disk and your http server can dish out the next request. This works great but there are a few things that might cause you some problems:
- The page cache writes to disk, so if you’ve got multiple servers that you want to protect behind a common cache layer you’ll have to do extra work.
- The page cache, because it writes directly to disk, doesn’t support time based cache expiry.
- Disks tend to be slow, perhaps caching to memory will squeeze more out of your servers?
Standard HTTP caching will help us get around this. The basic principle is: rather than calling the page caching filter in our controller we set some HTTP headers which can instruct our caching technology of choice in how to treat the response.
Standards based caches are also implemented in many clients – meaning resources which we’ve marked as cached are also able to be cached in the browser. This saves us not only the additional DB and CPU resources of rendering a new page but also the actual bandwidth of the request
Types of HTTP Cache
The easiest cache to implement is a time based cache like
max-age. When we create our response we set a cache header that says ‘expire me in 3 hours’ or ‘expire me at 15:00’. Using the ActionController method expires_in will let you set the appropriate headers for time based expiration. You’ll want to avoid caching errors and so it’s best to perform the header setting in an after filter which wont be called upon should your app 500.
We can also cache slightly more intelligently using
etag. After the first request, the client will either send back the
etag (probably both) tokens. It’s the application’s job to identify whether or not we can send an empty response back (and thus use the cached resource) or if we should draw out our page again and use that.
Using conditional caches we’re able to ensure that our users have the most up to date version of our page while still saving some resources on our server – although there is additional expense since we have to get our resources out of their store in order to check if they’re newer than the cached values would suggest.
Smarter people than I have named these two styles of cache ‘strong’ and ‘weak’. ‘Strong’ caches – the time based expiration kind – are able to serve requests without any sort of conditional get, while ‘weak’ caches cannot.
On high-ish traffic websites, lets say more than 1000 request a minute, with content that can’t start to look out of date very quickly I like to use micro-caches. A Micro-cache is a timed,
expires_in style cache which have a very short lifespan. A timespan of a minute allows you to drastically reduce the work load of your system (1000/req/m down to 1/req/m) while still keeping your content looking and feeling fresh.
So, now that we’ve set up whichever type of cache is most useful to our application, it’s time to beef up our caching layer even more. Left as is, we’ll be caching requests per user and not globally. For a global cache we’ll want to implement some sort of cache that sits between our HTTP server and our Rails app.
Rack::Cache, as I’d hope you could guess, is a bit of rack middleware that sits just in front of your Rails application. Rails responses with the correct headers will be stored in whichever Rack::Cache compatible store you choose (process memory, files, memcache, redis) and, come the next request, will be returned straight from that store without any of the overhead of Rails.
Rack::Cache is installed with Rails 3 and is alarmingly easy to set up – just make sure you’ve required it and you’re good to go. You’ll most likely find better performance numbers if you change the setting for the entity and the meta store from their default. The Metastore is small and is checked on every request, it stores details about the request – including header values. In memory storage, or even something like Redis or Memecached, the Metastore is a good choice here.
The Entity store is the actual storage of your cached pages. It is only accessed on cache hit and contains a significantly larger payload. Data stores like the file store become usable here but, personally, i’d stick it in Memcached or Redis and be done with it.
More Caching Options
Rack::Cache is certainly cool and is a breeze to set up but, being ruby, it’s not going to be your fastest choice. Luckilly, there are loads of other options for the budding cacher.
Whereas Rack::Cache sits behind your server (Nginx or Apache perhaps) but, at least conceptually, ahead of your application code, a proxy server sits in front of the server, silently forwarding requests.
One of the best known caching proxy servers is the fantastically named varnish. There are a bunch of settings and tweaks you can do to varnish to make it behave in different ways, but it’s perhaps out of the scope of this article to go into every nook and cranny.
Firstly, you’ll need to install varnish, which I shall leave as an exercise for your package manager to patience. Once you’ve got it up and running pop this in the config.
This tells varnish that you’ll be running a cache for facebookforcats (you’re expecting a lot of traffic) on port 3000, locally. Be sure to run your application on a port which is not 80. To start the cache:
varnishd -a :80 -f catalyst.vcl -s file,/var/cache/varnish.cache,1024M
This tells varnish to start, listening on port 80, using the config file ‘catalyst.vcl’ and to create a cache file /var/cache/varnish.cache no more than 1gb in size.
Well, that’s it for this week. If I’ve missed anything out or have been unclear (or indeed, incorrect) do let me know in the comments and I’ll do my best to fix it.