Debugging Varnish

Richo Healey

This article was originally published on the 99designs Tech Blog

At 99designs we heavily (ab)use Varnish to make our app super fast, but also to do common, simple tasks without having to invoke our heavy-by-contrast PHP stack. As a result, our Varnish config is pretty involved, containing more than 1000 lines of VCL, and a non-trivial amount of embedded C.

When we started seeing regular segfaults, it was a pretty safe assumption that one of us had goofed writing C code. So how do you track down a transient segfault in a system like Varnish? Join us down the rabbit hole…

Get a core dump

The first step is to modify your production environment to provide you with useful core dumps. There are a few steps in this:

  1. First of all, configure the kernel to provide core dumps by setting a few sysctls:

    echo 1 > /proc/sys/kernel/core_uses_pid
    echo 2 > /proc/sys/fs/suid_dumpable
    mkdir /mnt/cores
    chmod 777 /mnt/cores
    echo  /mnt/cores/core > /proc/sys/kernel/core_pattern
    

    In order, this:

    • Tells the kernel to append pid’s to core files to make it easy to marry up the cores with the logs
    • Tells the kernel that suid binaries are allowed to dump core
    • Creates a place to store cores on AWS’s ephemeral storage (if like us you’re on EC2)
    • Tells the kernel to write core files out there
  2. With this done, and no known way to trigger the bug, play the waiting game.

  3. When varnish explodes, it’s show time. Copy the core file, along with the shared object that varnish emits from compiling the VCL (Located in /var/lib/varnish/$HOSTNAME) over to a development instance and let the debugging begin.

Locate the crash point

If you have access to the excellent LLDB from the LLVM project, use that. In our case, getting it to work on Ubuntu 12.04 involves upgrading half the system, resulting in an environment too dissimilar to production.

If you spend a lot of time in a debugger, you’ll probably want to use a helper like fG!’s gdbinit or voltron to make your life easier. I use voltron, but because of some of the clumsiness in gdb’s API, immediately ran into some bugs.

Finally, debugging environment working, it’s time to dig into the crash. Your situation is going to be different to ours, but here’s how we went about debugging a problem like this recently:

Debugging the core dump with voltron

Debugging the core dump with voltron

Debugging the core dump with voltron

As you can see in the code pane, the faulting instruction is mov 0x0(%rbp),%r14, trying to load the value pointed to by RBP into r14. Looking in the register view we see that RBP is NULL.

Inspecting the source, we see that the faulting routine is inlined, and that the compiler has hijacked RBP (The base pointer for the current stack frame) to use as argument storage for the inline routine

The offending assembly code

The offending assembly code

The offending assembly code

Of specific interest is this portion:

   0x000000000045a7c9 <+265>:   mov    0x223300(%rip),%rbp        # 0x67dad0 <pagesize_mask>
   0x000000000045a7d0 <+272>:   not    %rbp
   0x000000000045a7d3 <+275>:   and    0x10(%r14),%rbp
   0x000000000045a7d7 <+279>:   cmpb   $0x0,0x223303(%rip)        # 0x67dae1 <opt_junk>
=> 0x000000000045a7de <+286>:   mov    0x0(%rbp),%r14
   0x000000000045a7e2 <+290>:   mov    0x28(%r14),%r15

Which in plain english:

  • Loads a rip relative address into rbp (pagesize_mask)
  • Inverts rbp bitwise
  • Performs a bitwise and against 16 bytes into the structure pointed to by r14, (mapelm->bits)
  • Pointlessly checks if pagesize_mask is NULL
  • Tries to load the address pointed to by rbp into r14, which faults.

Which is emitted by:

static inline void
arena_dalloc_small(arena_t *arena, arena_chunk_t *chunk, void *ptr,
    arena_chunk_map_t *mapelm)
{
    arena_run_t *run;
    arena_bin_t *bin;
    size_t size;

    run = (arena_run_t *)(mapelm->bits & ~pagesize_mask);
    assert(run->magic == ARENA_RUN_MAGIC);
    bin = run->bin; // XXX KABOOM
    size = bin->reg_size;

We now know that the fault is caused by a mapelm struct with a bits member
set to zero; but why are we getting passed this broken struct with garbage in
it?

Digging in deeper

Since this function is declared inline, it’s actually folded into the calling frame. The only reason it actually appears as in the backtrace is because the callsite is present in the DWARF debugging data.

We can poke at the value by inferring its location from the upstream assembly, but it’s easier to jump into the next upstream frame and inspect that:

(gdb) frame 1
#1  arena_dalloc (arena=0x7f28c4000020, ptr=0x7f28c40008c0, chunk=0x7f28c4000000) at jemalloc_linux.c:3939
3939    in jemalloc_linux.c
(gdb) info locals
pageind = <optimized out>
mapelm = 0x7f28c4000020
(gdb) p *mapelm
$3 = {link = {rbn_left = 0x300000001, rbn_right_red = 0x100002fda}, bits = 0}
(gdb)

So this looks like an element in a red black tree, with two neighours and a null for the bits member. Let’s double check:

(gdb) ptype *mapelm
type = struct arena_chunk_map_s {
    struct {
        arena_chunk_map_t *rbn_left;
        arena_chunk_map_t *rbn_right_red;
    } link;
    size_t bits;
}
(gdb) ptype arena_run_t
type = struct arena_run_s {
    arena_bin_t *bin;
    unsigned int regs_minelm;
    unsigned int nfree;
    unsigned int regs_mask[1];
}
(gdb)

Wait, what?

Looking back to get our bearings:

run = (arena_run_t *)(mapelm->bits & ~pagesize_mask);

The code is trying to generate a pointer to this arena run structure, using the number of bits in the mapelm struct, AND against the inverse pagesize_mask to locate the start of a page. Because bits is zero, this is the start of the zero page; a NULL pointer.

This is enough to see how it’s crashing, but doesn’t give us much insight for why. Let’s go digging.

Looking back at the code snippet, we see an assertion that the arena_run_t structure’s magic member is correct, so with that known we can go looking for other structures in memory. A quick grep turns up:

./lib/libjemalloc/malloc.c:#  define ARENA_RUN_MAGIC 0x384adf93

pagesize_mask is just the page size -1, meaning that any address bitwise AND against the inverse of the pagesize_mask will give you the address at the beginning of that page.

We can therefore just search every writable page in memory for the magic number at the correct offset.

.. Or can we?

typedef struct arena_run_s arena_run_t;
struct arena_run_s {
#ifdef MALLOC_DEBUG
    uint32_t    magic;
#  define ARENA_RUN_MAGIC 0x384adf93
#endif

    /* Bin this run is associated with. */
    arena_bin_t *bin;
...

The magic number and magic member of the struct (Conveniently located as the first 4 bytes of each page) only exists if we’ve got a debug build.

Aside: can we abuse LD_PRELOAD for profit?

At this point all signs point to either a double free in varnish’s thread pool implementation, leading to an empty bucket (bits == 0), or a bug in its memory allocation library jemalloc.

In theory, it should be pretty easy to rule out jemalloc, by swapping in another malloc library implementation. We could do that by putting, say tcmalloc, in front of its symbol resolution path using LD_PRELOAD:

We’ll add:

export LD_PRELOAD=/usr/lib/libtcmalloc_minimal.so.0

to /etc/varnish/default and bounce varnish. Then move all the old core files out of the way, wait (and benchmark!)

However, there’s a flaw in our plan. Older versions of varnish (remember that we’re on an LTS distribution of Ubuntu) vendor in a copy of jemalloc and statically link it, meaning that the symbols free and malloc are resolved at compile time, not runtime. This means no easy preload hacks for us.

Rebuilding Varnish

The easy solution won’t work, so let’s do the awkward one: rebuild varnish!

apt-get source varnish

Grab a copy of the varnish source, and link it against tcmalloc. Before that though, I deleted lib/libjemalloc and used grep to remove every reference to jemalloc from the codebase (Which was basically just some changes to the configure script and makefiles)

and then add -ltcmalloc_minimal to CFLAGS before building. As an aside, the ubuntu packages for tcmalloc ship /usr/lib/libtcmalloc_minimal.so.0 but not /usr/lib/libtcmalloc_minimal.so, which means the linker can’t find them. I had to manually create a symlink.

With this new varnish in production, we haven’t yet seen the same crash, so it appears that it was a bug in jemalloc, probably a nasty interaction between libpthread and libjemalloc (The crash was consistently inside thread initialization).

Try it yourself?

Let’s hope not. But if you do a lot of Varnish hacking with custom extensions, occasional C bugs are to be expected. This post walked you through a tricky Varnish bug, giving you an idea of the tools and tricks around debugging similar hairy segfaults.

If you’re messing around with voltron, you might find my voltron config and the tmux script I use to setup my environment a useful starting point.

Win an Annual Membership to Learnable,

SitePoint's Learning Platform

No Reader comments

Comments on this post are closed.