Getting Started with Varnish

Caching is everywhere. A request from your browser to its destination will pass a cache at almost every node. Your browser has a cache, most web proxy servers cache requests, and web servers like Apache and nginx will potentially be caching.

Varnish is a reverse proxy server; it sits in front of your web server and serves content from your server and no one else’s. Reverse proxy servers are tightly coupled to the web server and can act on messages received from it. For example, a cached page can be refreshed with a purge command from the backend, something you are unable to do with caches closer to the client. This means reverse proxy servers can, in some cases, cache content longer than the other types of caches.

Simply put, Varnish does one thing: serve web content super fast. It keeps your web server happy by handling most of the high traffic, and serves your visitors quickly. In other words, Varnish makes sure you are prepared for spikes in traffic and helps keep your bounce rate low.

As a caching service, Varnish is not unique, but, when it comes to performance, it really shines. The head architect of Varnish is Paul Henning Kamp (who is also a prominent FreeBSD developer). Every architectural decision made is focused on improved performance and/or better flexibility, which is why Varnish has few dependencies, its own configuration language, and is optimized for modern hardware running FreeBSD and GNU/Linux. All this results in a reverse proxy server that takes full advantage of the operating system in order to serve content extremely fast.

In this article we’ll set up Varnish to cache a simple web page for two minutes. We’ll walk through setting up a web server and configuring Varnish to cache our page. To keep things simple, we’ll set up both on the same host and use a Debian-based distro. You can later customize your setup as your needs dictate.

Your First Varnish Setup

Varnish fetches content from a backend server, which is the web server where your content is generated (in this case Apache). Our example will use a simple PHP Hello World page that updates every time its refreshed. So, let’s set that up first:

$ sudo apt-get install apache2 php5
$ echo '<?php echo date('h:i:s') . 'Hello, world!'; | sudo tee /var/www/world.php

Test that the page works by opening it in your browser. The time should update every time you refresh the page.

Let’s install Varnish next. It’s currently at version 3.x, and because it’s is developed and optimized for FreeBSD and GNU/Linux, you’ll find that all the main package systems support Varnish.

$ sudo apt-get install varnish

To configure Varnish, there are two files we need to edit: /etc/default/varnish and /etc/varnish/default.vcl. The first sets the options for varnishd, the Varnish daemon, and the second configures Varnish itself.

The location of the varnishd options file depends on your choice of operating system. For Debian based systems, make sure the DAEMON_OPTS variable in /etc/default/varnish is:

DAEMON_OPTS="-a :6081  
             -T localhost:6082  
             -f /etc/varnish/default.vcl  
             -S /etc/varnish/secret  
             -s malloc,256m"

The -a option tells varnishd to listen on port 6081 for requests. You can change this to port 80 when you are finally ready to let Varnish handle requests. The option -f tells varnishd the location of the second configuration file. The -T sets the location of the management interface. The management interface is where you can make changes to Varnish runtime. The -S options sets the location of the authentication secret for the management interface. The option -s decides how cached objects are stored.

Make sure the uncommented lines in /etc/varnish/default.vcl are as follows:

backend default {
    .host = "127.0.0.1"; 
    .port = "80"; 
}

Varnish communicates with the backend using HTTP, it also supports multiple backends if you should need that. This setup will fetch content from localhost on port 80.

Now we can start varnishd:

$ sudo service varnish start

Try opening our Hello World page through Varnish by adding the port we set varnishd to listen to to the URL (ie: http://localhost:6081/world.php). When you refresh the page, you’ll notice that the time only updates once every two minutes (a default Varnish setup will cache a page for two minutes given that it is a HEAD or GET request without a set Authorization or Cookie header).

Configuring Varnish

With our current setup, requests to localhost without a specified port will default to port 80 and are routed directly to Apache. When you feel comfortable that Varnish is caching what you want, you can switch Apache to listen on a different port, say 8080, and Varnish to listen on port 80 instead of 6081 in /etc/default/varnish. Remember to change the backend configuration in /etc/varnish/default.vcl too; it should be 8080 and not 80.

To configure Varnish further you need to know VCL, the Varnish configuration language (you might have noticed the file extension of the second file was .vcl). Having its own configuration language is a significant factor for achieving Varnish’s goals of flexibility and performance.

When Varnish processes a request, the request will go through a set of states. VCL code is how you decide what Varnish does with the request in each of these states, and this is why VCL is called a state engine. Each state has its own function in VCL which runs by default. You can edit the behavior of each state by redefining its function in default.vcl. Note that unless you add a return statement, redefining a function prepends the default VCL function. If you want to know what a default VCL function does, there is a copy of all the functions in a commented section at the bottom of the file.

It helps to see an example, so let’s edit vcl_fetch and change the time a page is cached in Varnish to 5 seconds. In default.vcl, add the following function:

sub vcl_fetch { 
  set beresp.ttl = 5s; 
} 

The varnish daemon will need to be reloaded for this to take effect, so restart the service with:

$ sudo service varnish restart

When you refresh your Hello World page now, the time will update every 5 seconds.

When Varnish looks for a page in the cache but can’t find it, it’s a miss. if it does find it, it’s a hit. The function vcl_fetch controls the state your request will eventually be in after a miss. In vcl_fetch Varnish has fetched the page from the backend and now has to decide how to cache the page, if at all. Each function in VCL has a set of objects available to it, vcl_fetch has several, including the backend response object beresp. By setting ttl (time to live) on the beresp object, we are telling Varnish that the page should be stored in cache for 5 seconds before it is invalid. Once 5 seconds have passed, a new request to the page means Varnish will need to fetch the page from the backend again.

Note that in the example above, there is no return statement in vcl_fetch, this means the default function will run afterwards. This is a good idea when writing VCL, the default VCL is there for a reason and you should have a good reason to skip it.

Varnish also has tools for analyzing and viewing its results and performance. These are helpful for fine tuning your configuration for high-load scenarios. To see one of them in action, try running varnishlog while you refresh your Hello World page with varnishlog -c. The -c option filters the varnish log for entries from your users as the requests arrive. To filter on entries to and from the backend, use the -b option.

Conclusion

That’s it, you now have a simple setup to experiment increasing your site’s performance. In this article you’ve learned the basics of Varnish, what it is and what the default setup will give you. We’ve also had a quick look at VCL, Varnish’s configuration language. VCL might be a little hard to wrap your head around, mostly because it is an unusual way of doing configuration – it is however surprisingly simple to use once you accept this approach, and it’s also more flexible than a configuration file.

If you want to continue on your path learning Varnish, visit varnish-cache.org for more information. You can also ask for help on IRC, the #varnish channel on the freenode server is full of skilled and helpful people. And, something all web developers should do at one point, is read RFC2616. The section on caching headers is relevant for those who want to get a better understanding of Varnish and caching. Good luck!

Win an Annual Membership to Learnable,

SitePoint's Learning Platform

  • Alexander Cogneau

    Nice write-up, but is there any significant advantage of using Varnish as reversed proxy over nginx for example?

    • Christian

      Looking at benchmarks, there doesn’t appear to be a significant advantage outside of separation of concerns. Varnish is built from the ground up to serve as a reverse proxy cache and it does its job quite well: we placed varnish in front of a typical LAMP stack and saw an 800% improvement when serving the static/cached response of the application server (our php output). Of course, when serving static/cached responses, the the session based elements of the page need to be determined using ESI or client-side ajax requests (we opted for the latter). All of this could have been accomplished with nginx, but it was far easier to simply place varnish in front of existing stack; also, I am a big fan of using specialization over kitchen sinks.

  • Anand

    Awesome tuts June indeed.