Giant Killing with Beanstalkd

beanstalkdIf you have ever dabbled in Service Orientated Architecture (SOA) or even read some interesting articles about it, you probably have come across the term “Message Queue”.

The really terse explanation of a Message Queue, or MQ, is it allows services within your architecture to adopt a “fire and forget” approach to interacting with other services. By placing a queue in the system, non time sensitive operations may be carried out at the leisure of services that care about them, disregarding technology or programming language.

As an example, let’s take a “send to a friend” feature within a Job Board application. Once the user has completed the form and clicked “Send”, do we really want the nitty-gritty of sending an email to a friend to live in our Job Board application?

Background Jobs

A common approach to this problem is to use background workers like Resque or Sidekiq. For the problem at hand, these are fine and somewhat more suitable. The only problem I have with that is:

  1. The logic of sending email lives in our application that does not necessarily care about email.
  2. I will probably duplicate the process of communicating to my SMTP server through a few applications within the architecture.
  3. Background workers know a little too much about their origin, i.e. what models they came from, what they can access (my whole app stack).

If your architecture is growing, it may be worth considering moving some background workers to a MQ. For me, MQs just work. You drop in some data and a daemon or application that cares about that message picks it up some time later and acts on it. Meanwhile, the originator has carried on focusing on its core business.

As the architecture grows and you add more services, some of these service may need to send email as well. At that point, you have established a clear, trusted method of sending email. You simply drop some data in the email queue and it will get sent.

Beanstalkd

Hopefully now you are getting the gist of why MQs are awesome. There are a few open source MQs available, most notable are RabbitMQ (there is a nice article on RubySource with details) and my personal favourite and what we will be using today Beanstalkd.

Getting started with Beanstalkd really couldn’t be simpler. On OSX, you want to use homebrew (brew install beanstalkd) or for debian linux flavour you can use sudo apt-get install beanstalkd. It seems pretty well supported by most package mangers across platforms. You can see the details on the Beanstalkd download docs.

Once installed, you can open the terminal and execute beanstalkd. This will startup a Beanstalkd instance using its default port 11300 on localhost in the foreground. Not always ideal to run it in the foreground, so my typical command looks something like:

beanstalkd -b ~/beanstore &

This simply persists the queue data in a binstore under the directory ~/beanstore instead of just memory and runs the process in the backgound (the ampersand). For development, these settings are fine. When it comes to production, I would suggest you have a read of the docs pertaining to the admin tool that ships with Beanstalkd.

Beanstalkd Lingo

Beanstalkd has some nice vocabulary for describing the main players and operations. Let’s walk through them.

Tubes

A tube is a namespace for your messages. A Beantstalkd instance can have multiple tubes. On a vanilla boot, Beanstalkd will have a single tube named default.

The idea is you wish a certain process to listen to messages coming in on a specific tube. As mentioned, tubes just act as namespaces for the consumers of the queue.

Jobs

The Jobs are what we are placing in a tube. It’s common for me to place JSON in a tube and marshall that at the other end.

Beanstalkd doesn’t really care about the content of the job, so things like YAML, plain text or Thrift would be just fine.

In a normal, happy path operation, jobs have 2 states:

  1. Ready – Waiting to be processed.
  2. Reserved – Being processed

If all goes well, the job is deleted. If there is a problem with the job, say our SMTP server is down, the job is put in a state of “Buried”. It will remain “Buried” until the tube is “kicked”. This will simply place the job back into the “Ready” state. So, with the SMTP back up, we kick the tube and the world keeps spinning.

One other state we haven’t covered is “Delayed”. This simply means the job does not enter the state of “Ready” until some pre-determined interval has elapsed. I personally have not used this state much, so won’t cover it any more than mentioning that it exists.

OM NOM NOM

Now we have Beanstalkd running on our development boxes, we want to get some jobs in the queue. To achieve that, my usual weapon of choice is the Beaneater gem. Getting a job into a tube is as simple as:

require 'beaneater'
require 'json'

beanstalk = Beaneater::Pool.new(['localhost:11300'])
tube = beanstalkd.tubes['my-tube']
job = {some: 'key', value: 'object'}.to_json

tube.put job

And that is it. Now we get to the interesting bit, consuming the tube and all the jobs who live there.

I am a big fan of a daemon process handling that. If the tubes start getting too full, we can spin up more daemons to help clear the backlog of jobs. Of course, we can also kill them off as required.

So far I have used the Dante gem for wrapping scripts into daemons. It seemed a bit lighter than Daemon Kit and I like to keep my daemons from getting bloated. The benefit of using Dante over something like ruby script/my_mailer_script.rb for me is nothing more than Dante gives you Process ID (PID) file generation out the box. With that, I can keep the daemons in check with monit.

Beaneater provides a really nice API for consuming jobs in 2 ways. The first is manually stepping through the process of reserving a job, working on it, then deleting if it completes correctly or burying if an exception is raised. It looks something like this:

beanstalkd.tubes.watch!('my-tube')
loop do
  job = beanstalk.tubes.reserve
  begin
    # ... process the job
    job.delete
  rescue Exception => e
    job.bury
  end
end

A couple of things here worth mentioning. Yes, I’m using an infinite loop and the reserve method on the tube will actually sit and wait for a job to be “Ready”, reserve it, and continue.

Beaneater provides a better interface for long running tasks and the above can simply be condensed into:

beanstalkd.jobs.register('my-tube') do |job|
  # ... process the job
end

beanstalkd.jobs.process!

This method wraps the behaviour (albeit in a much better way) of the previous example, reserving, processing, then deleting or burying based on the outcome.

No Magic Beans

The beauty of Beanstalkd is its absolute simplicity. There is really not much more I would be willing to dive into as an introduction. In terms of getting things running quickly, it is no more complicated than any of the background worker solutions discussed earlier.

It does make sense to be pragmatic in your adoption of MQs, to be honest. Resque, Sidekiq etc. all have their place and work very well, but Beanstalkd addresses a few more problems, namely, interfacing between services which may or may not be written in Ruby (.NET clients for Beanstalkd are available).

In fact, the entire thing is completely language agnostic. The neckbeard way of communicating with beanstalkd is via it’s own protocol over TCP. The Beaneater gem, as you will probably know, abstracts all that protocal stuff into a well packaged API for us. It is safe to say I’ll be leaning on Beaneater gem when using Beanstalkd for some time to come.

If I had any advice on designing/composing tube consumers, stick to the Single Responsibility Principle (SRP) as much as possible. There will come a time when you will have to kick a buried job. If that job writes to a database AND sends an email, what happens when the sending of the email blows up? Replaying said message will result in a duplicate database entry. By splitting the processing of the job into the smallest responsibilities that are reasonable, the less you have to worry about performing duplicate actions.

I really urge you too look to Beanstalkd as your application architecture grows. In personal experience, I have found it simple to get running, straightforward to manage and maintain, and the ruby client via Beaneater is one of the better interfaces I have used.

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • http://bonsaierp.com Boris Barroso

    I have tryed the backburner gem, what do you think about it?, can you show and example of using dante, beaneter with a rails app

  • http://bangline.co.uk Dave Kennedy

    Hi Boris,

    I haven’t used the backburner gem. Quickly looking over the docs it looks like a delayed/background job implementation using beanstalkd instead of redis.

    When building apps I usually go down the resque / sidekiq line but as I mention in the article as the architecture grows its becomes apparent some things are just not a delayed job. It seems more appropriate to drop a chunk of data in a queue and forget about it (a daemon listening will deal with it). So I have no real opinion of backburner I’m afraid!

    I don’y directly use dante with rails, its purpose for me is maily to wrap a ruby script as a daemon and provide a PID for monit to pick up on.

    I knocked up an example of using a daemon with beanstalkd for you here https://gist.github.com/bangline/5897674

    You would kick that off with `./queue_daemon -P /path/to/pid_file.pid`

  • Benoit LeBlanc

    Hi Dave,

    Have you experienced any issues when you try to put a job to the queue?

    Whenever I try to do:

    tube.put “5”

    I get the following:

    {
    :status => “INSERTED”,
    :body => nil,
    :id => “3”,
    :connection => #
    }

    As you notice the body is nil, and no matter what I try to send through the tube, it is always an empty body. The receiving side tries to process the job, but because of an empty body it can’t do anything.

    Any ideas?

  • http://bangline.co.uk Dave Kennedy

    Hi Benoit,

    Well I hadn’t seen this before so threw up a quick irb. I saw what you where seeing, but when I peeked the job the body was set as expected

    https://gist.github.com/bangline/5962377

    Any help?

  • Benoit LeBlanc

    Hi Dave,

    Thanks for taking a look.

    It did help me.

    It was my bad, I was assuming because it said nil that it didn’t transfer anything, but the problem was that I wasn’t properly parsing the JSON on the other side, so when I did:

    job.body["whatever"]

    it was giving me an empty string, but once I did:

    JSON.parse(job.body)["whatever"]

    Everything was fine.

    Thanks again, as this article resolved a major headache I was having with communication between processes.

    Cheers,