Docker for Rubyists
Docker is a pretty awesome tool and one that I think will have a pretty big impact on how we do development. But, if you haven’t had much experience with Linux containers (or virtual machines), Docker’s underlying ideas as well as its benefits may be a little difficult to understand.
This article will give you an overview of what Docker is, how it works and a little bit about how it relates to Ruby/Rails.
Virtual Machines and Linux Containers
Say we want to run a Linux distribution inside a Windows 7 machine (or vice versa). We can use a virtual machine to do this.
Essentially, virtual machines create an environment within an operating system (in our example, Windows 7 – this is called the host OS) that another operating system can “live” inside (in our example, the Linux distribution, the guest OS). However, completely isolating a guest operating system efficiently is no easy task. In fact, it is such a difficult problem that, for a long time, it was seen as impossible. However, we now have several different virtual machine hypervisor implementations.
There are two different philosophies to virtualization, full virtualization and paravirtualization.
Full virtualization is where the guest operating system has no idea that it is being run in a virtualized environment. The obvious upside to this is that the operating system does not have to be modified in order for it to be virtualized. However, there are also significant downsides, one of which being that sometimes, guest operating systems, need to know information about virtualized and real hardware.
Paravirtualization, on the other hand, requires small modifications to guest operating systems in exchange for higher efficiency and more transparency. Paravirtualization is a term first referred to in the Denali virtualization project, from which the concept grew. Today, there’s many different virtualization options: Xen, VMWare, etc. If you want to learn more about virtual machines, check out the Xen research paper, which is fairly accessible if you have a little background about operating system design.
Virtual machines allow us to divide up server resources easily. Amazon EC2 is essentially run by VMs on top of powerful hardware servers. We all know what PaaS providers have done for what used to be the difficult and expensive process of deployment. But, virtual machines incur a lot of overhead – you have a filesystem, peripherals, etc.
Containers (see next section) are a bit like lightweight virtual machines because they use the same underlying operating system. However, processes can be completely isolated from the rest of the operating system.
Containers, chroot and Docker
The basic idea behind containers is that processes within a container run under a separate root filesystem/directory. Basically, that means that processes get to create their own files/directories completely independent of the host operating system and other containers on the same system (if any).
There is an operation on nix systems called ‘chroot’ that can *sort of do this. In essence, it allows you run a program, but, with a different root directory. So, you can run the “useradd” command with the root directory being “~/anything”, and if you try to run “ls”, you would find that bash would attempt to find “ls” in “~/anything/bin/ls” rather than “/bin/ls”.
The benefit of this being that you can divide the underlying filesystem into many different root filesystems, with different processes running under each piece so that processes can be separated. However, this doesn’t solve the entire problem; you still only have one network stack and this whole process is incredibly tedious and annoying.
Linux containers grew out of chroot and some of its drawbacks. But, like a lot of things related to Linux, containers can be a bit difficult to use pracitcally. So, Docker came along and made Linux Containers awesome and usable for people who didn’t want to spend a ton of time learning about the intricacies of the Linux Container system. Docker has some pretty awesome features, including: filesystem, network and resource isolation, logging, and (personal favorite) an interactive shell.
Docker allows you to create containers that you can take from your workstation to the server and expect it to run the same way on the server (because everything is isolated).
Obviously, to do anything useful on a container (unlike our example of chroot above), you need a basic set of utilities such as “ls”, “mkdir”, etc. Docker allows you to have images which provide this set of binaries. Then, you can use the “docker run” command, or the Docker config file to detail what you want your container to do (e.g. install rvm, install ruby, run your code, etc.).
Obviously, this seems like a lot of work to write all these config files. If you think about it, everybody who uses Rails probably uses roughly the same container layouts. Fortunately, someone already solved that problem for us.
Dokku is absolutely fantastic. It is a “mini-Heroku” in under a 100 lines of code of Bash! It uses docker under the hood to make and manage Linux containers. You can install it with the “pipe to bash” installer style (for which I have expressed my distaste in the past; you’re piping unknown code into your shell prompt, that too with root user rights):
wget -qO- https://raw.github.com/progrium/dokku/master/bootstrap.sh | sudo bash
You can deploy an app with git by simplying pushing:
git remote add dokku git@HOSTNAME:APPNAME git push dokku master
This solves a really important problem (in my eyes) – you can get the simplicity of Heroku with a single EC2 or DigitalOcean instance! This used to be quite an annoying procedure with every new app you wanted deployed on EC2.
Wrapping it Up
Hopefully this clears up a few points about Linux Containers, what Docker is and what benefits it gives us as web developers. I find the docker.io website to be quite terse on an introduction to the concept – this article hopes to fill the gap.