Understanding Concurrent Programming With Ruby’s Goliath

PostRank recently released a new Ruby web server: Goliath. It uses an event loop in the same manner as node.js and nginx to achieve high levels of concurrency, but adds some special sauce that allows traditionally complicated asynchronous code to be written in a synchronous style.

For example, asynchronous code in Ruby typically looks like this (using the eventmachine library):

require 'eventmachine'
require 'em-http'

EM.run {
  EM::HttpRequest.new('http://www.sitepoint.com/').get.callback {|http|
    puts http.response
  }
}

This is neat in that it allows the application to do other things while the HTTP request completes (it is “non-blocking”), but to fetch two sites in succession, you need to nest callbacks:

EM::HttpRequest.new('http://www.sitepoint.com/').get.callback {|http|
  # extract_next_url is a fake method, you get the idea
  url = extract_next_url(http.response)

  EM::HttpRequest.new(url).get.callback {|http2|
    puts http2.response
  }
}

As you can imagine, this pattern gets messy fast. Goliath allows us to write the above code in the simple synchronous fashion we are familiar with:

http = EM::HttpRequest.new("http://www.sitepoint.com").get
# extract_next_url is a fake method, you get the idea
url = extract_next_url(http.response)
http2 = EM::HttpRequest.new(url).get

…yet behind the scenes it still executes asynchronously! Other code can be run while the HTTP requests are running.

This blows my mind. How does it work? Let’s find out.

Fibers

From the documentation, Goliath claims to works its magic by “leveraging Ruby fibers introduced in Ruby 1.9+”. This first hint sends us to the ruby rdocs to find:

Fibers are primitives for implementing light weight cooperative concurrency in Ruby. Basically they are a means of creating code blocks that can be paused and resumed, much like threads. The main difference is that they are never preempted and that the scheduling must be done by the programmer and not the VM.

Urgh, too many big words. Let’s just dive in and start poking around the Goliath code. The Goliath documentation contains a full example for proxying a site:

require 'goliath'

require 'em-synchrony'
require 'em-synchrony/em-http'

class HelloWorld < Goliath::API
  def response(env)
    req = EM::HttpRequest.new("http://www.google.com/").get
    resp = req.response

    [200, {}, resp]
  end
end

# to play along at home:
#   $ gem install goliath
#   $ gem install em-http-request --pre
#   $ ruby hello_world.rb -sv

We know that for this to occur in an asynchronous manner there must be some funny business going on in that #get call, so let’s try and find that. My spider sense tells me it will be somewhere in em-synchrony/em-http

$ gem unpack em-synchrony
Unpacked gem: '/Users/xavier/Code/tmp/em-synchrony-0.3.0.beta.1'
$ cd em-synchrony-0.3.0.beta.1
# I used tab completion on the next line to find the exact path
$ cat lib/em-synchrony/em-http.rb

That reveals:

# em-synchrony/lib/em-synchrony/em-http.rb
begin
  require "em-http"
rescue LoadError =< error
  raise "Missing EM-Synchrony dependency: gem install em-http-request"
end

module EventMachine
  module HTTPMethods
     %w[get head post delete put].each do |type|
       class_eval %[
         alias :a#{type} :#{type}
         def #{type}(options = {}, &amp;blk)
           f = Fiber.current

            conn = setup_request(:#{type}, options, &amp;blk)
            conn.callback { f.resume(conn) }
            conn.errback  { f.resume(conn) }

            Fiber.yield
         end
      ]
    end
  end
end

Jackpot! Fibers! It appears to be monkey-patching the existing em-http library, so before we go too much further let’s find out what normal em-http code looks like without fibers. There is a handy example on the em-http-request wiki:

EventMachine.run {
  http = EventMachine::HttpRequest.new('http://google.com/').get :query =< {'keyname' =< 'value'}

  http.errback { p 'Uh oh'; EM.stop }
  http.callback {
    p http.response_header.status
    p http.response_header
    p http.response

    EventMachine.stop
  }
}

It looks almost similar to the code above which is promising, and when we dig in a bit further it becomes even more so.

$ gem unpack em-http
ERROR:  While executing gem ... (Gem::RemoteFetcher::FetchError)
    SocketError: getaddrinfo: nodename nor servname provided, or not known (http://rubygems.org/latest_specs.4.8.gz)
# Oh noes it doesn't work!
# Search for em gems
$ gem list em- 

*** LOCAL GEMS ***

em-http-request (1.0.0.beta.2, 0.3.0)
em-socksify (0.1.0)
em-synchrony (0.3.0.beta.1)

$ gem unpack em-http-request # Ah that is probably it
$ cd em-http-request-1.0.0.beta.2
$ ack "get" lib/
lib/em-http/http_connection.rb
4:    def get    options = {}, &amp;blk;  setup_request(:get,   options, &amp;blk); end

Note on the last line that get defers straight to setup_request, which is the same call that is made in fiber example above. Yep, pretty much the same. Now we can head back to the fiber code.

f = Fiber.current
conn = setup_request(:#{type}, options, &amp;blk)
conn.callback { f.resume(conn) }
conn.errback  { f.resume(conn) }
Fiber.yield

It appears what is happening is rather than immediately doing any work when a callback triggers, resume is called on the current fiber, presumably starting back up this thread at the point yield was called. Checking the documentation for Fiber.yield validates this, and also explains how the conn variable is returned from this method in the last sentence:

Yields control back to the context that resumed the fiber, passing along any arguments that were passed to it. The fiber will resume processing at this point when resume is called next. Any arguments passed to the next resume will be the value that this Fiber.yield expression evaluates to.

Using it

We now have an idea of how Goliath works it magic, though it may still be a fuzzy one. Let’s see if we have it right by trying to write some code that emulates it.

Remember that this fiber trick is simply a way of simplifying callback-littered code, so we should be able to first write a non-fiber-aware method and then clean it up. I like to start with a dirt simple example, so we are going to write a basic Goliath class that blocks for one second then renders some text.

class Surprise < Goliath::API
  def response(env)
    sleep 1
    [200, {}, "Surprise!"]
  end
end

Hit that in your web browser and bingo, it waits for a second. Not so fast though tiger, what happens when we issue multiple simultaneous requests:

$ ab -n 3 -c 3 127.0.0.1:9000/ | grep "Time taken"
Time taken for tests:   3.011 seconds

Alas, our webserver was only serving one request at a time. That’s not web scale. The sleep call not only blocks our response, but the entire server. That’s why we moved to evented programming in the first place. Let’s try a classic EventMachine timer instead:

class Surprise < Goliath::API
  def response(env)
    EventMachine.add_timer 1, proc {
      [200, {}, "Surprise!"]
    }
  end
end

Of course this does not work, because the #response method needs to appear synchronous. What happens in this case is that the #add_timer returns nil and Goliath immediately tries to render that, exploding in the process. The timer triggers sometime later, and no code is still around to care. We cannot send the result of our timer proc as the return value for the method.

We need to combine the synchronous nature of the first example, with the asynchronous elements of the second; a beautiful frankenstein. Hopefully you have caught on that we can use fibers to do the stitching.

class Surprise < Goliath::API
  def response(env)
    f = Fiber.current
    EventMachine.add_timer 1, proc { f.resume }
    Fiber.yield

    [200, {}, "Surprise!"]
  end
end

We steal the pattern we saw in em-synchronicity/em-http above, grabbing the current fiber and setting up a resume call in the asynchronous callback which resumes execution over at the Fiber.yield. Testing this with ab, we see that this indeed solves our concurrency issue:

$ ab -n 3 -c 3 127.0.0.1:9000/ | grep "Time taken"
Time taken for tests:   1.009 seconds

These fiber things are pretty cool.

Wrapping Up

In exploring the Goliath source code and associated libraries we discovered how it pulls off its asynchronous-masquerading-as-synchronous trick, and were able to put that knowledge into practice with a simple example.

To practice your code reading, here are some other research tasks for you to try:

  • Find where Goliath calls into the #response method and see if there are any other lurking fiber tricks to be found.
  • Investigate one of the other libraries that em-synchrony provides an API for, such as em-mongo.
  • Rack-fiber_pool uses fibers in a similar context, check it out and see what it is getting up to.

Let us know how you go in the comments. Tune in next week for more exciting adventures in the code jungle.

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • Pingback: RubySource: Concurrent Programming With Ruby’s Goliath » Ruby on Rails » SitePoint Blogs

  • http://www.timonv.nl Timon Vonk

    Great post! Always good to see something cool taken apart.

  • http://www.saturnflyer.com Jim

    Thanks! This i-dont-know-what-im-doing walk-through really helps to make it easy to understand.

  • http://railsperformance.blogspot.com John McCaffrey

    Xavier,

    Great post! Nice detective work. (like watching an episode of CSI or Law and Order)

    I’ve been following the event-machine stack, trying to which areas of my existing apps would be improved by adding in em components, particularly finding ways to get the biggest bang out of heroku (which uses thin and postgres, which can both be used with event machine stuff).

    Thanks for breaking it down

  • http://www.deploymentzone.com Charles Feduke

    Really nice article. I’ve been looking at Goliath since it was announced wondering if it is something we should be using. Understanding the internals helps me wrap my head around it… and now I know how to easily peer into the internals of other gems!

  • http://www.igvita.com/ Ilya Grigorik

    Great post! To someone who hasn’t seen this pattern before (with Fibers and all), this feels like black magic at first, but in reality, it is a very straightforward trick.

  • http://luigimontanez.com Luigi Montanez

    Seriously, one of the best articles featuring code spelunking I’ve ever read. Stayed with it to the very end.

  • http://jackdempsey.me Jack Dempsey

    Thanks Xavier, I too am a big fan of the style. It’s an honest capturing of what it’s like to learn this sort of thing for the first time (or in my case with fibers, the 10th).

    Everything was clear up til the end, which is where the details get fuzzy for me. If I could add in my $.02 to this final chunk:

    class Surprise < Goliath::API
    def response(env)
    f = Fiber.current
    EventMachine.add_timer 1, proc { f.resume }
    Fiber.yield
    [200, {}, "Surprise!"]
    end
    end

    This code is running in the context of a currently executing fiber. This is why you can't just pop into irb and call Fiber.current (you have to require 'fiber' before you do that).
    The EventMachine call is important to clearly understand. As documented here http://eventmachine.rubyforge.org/classes/EventMachine.html#M000013 "EventMachine#add_timer is a non-blocking call." So after that line executes we immediately go to the next line.
    Fiber.yield returns control to the calling fiber. Things are halted at this point.
    Goliath goes and processes the next request and goes through the same steps, in this case, 2 more times
    One second after that first add_timer call, the callback is executed and the f.resume call resolves and grabs control back from goliath, at the point at which is was stopped (the Fiber.yield location). We then return the rack result at this point.
    This happens 2 more times for the other callbacks and the result is that 3 responses occur in just over 1 full second.

    How's this sound to everyone else? Clear? Correct? :-)

  • http://xaviershay.com/ Xavier Shay

    Thanks everyone for the feedback, I was curious as to how this format would work. Seems like it’s a winner :)

    Jack: That’s a great description! (I tried to fix the formatting but didn’t have much luck – I’ll chat to the editors and see if we can sort it out.)

  • Dan Cheail

    Let’s see what happens…


    class Comment < Foo

    def something
    "bar"
    end

    end

  • http://benmanns.com Benjamin Manns

    Thanks for the overview; this was definitely very helpful.

  • http://perfectskies.com DBackeus

    Recommend gemedit to avoid the hassle of gem unpacking when peeping installed gems.

    gem install gemedit
    gem edit em-synchrony

  • Pingback: Pie in the Sky (November 18, 2011) | MSDN Blogs

  • Akshay