Understanding Concurrent Programming With Ruby's Goliath

Key Takeaways

Goliath is a Ruby web server that uses an event loop, similar to node.js and nginx, to achieve high levels of concurrency. It allows traditionally complex asynchronous code to be written in a synchronous style, making it easier to manage and understand.
Goliath uses Ruby fibers, introduced in Ruby 1.9+, as a means of creating code blocks that can be paused and resumed, much like threads. This allows other code to run while HTTP requests are being processed, enhancing efficiency and concurrency.
Goliath’s event-driven architecture allows for more responsive applications, as it can quickly switch between tasks as needed. It is particularly suited for applications that need to handle a large number of concurrent connections, such as chat servers, real-time analytics, or high-traffic APIs.
While Goliath can be used with Rails or other Ruby frameworks, its event-driven architecture may require a different approach to structuring your application. It’s crucial to design your application to take advantage of Goliath’s non-blocking nature.

PostRank recently released a new Ruby web server: Goliath. It uses an event loop in the same manner as node.js and nginx to achieve high levels of concurrency, but adds some special sauce that allows traditionally complicated asynchronous code to be written in a synchronous style.

For example, asynchronous code in Ruby typically looks like this (using the eventmachine library):

require 'eventmachine'

require 'em-http'
EM.run {

  EM::HttpRequest.new('https://www.sitepoint.com/').get.callback {|http|

    puts http.response

  }

}

This is neat in that it allows the application to do other things while the HTTP request completes (it is “non-blocking”), but to fetch two sites in succession, you need to nest callbacks:

EM::HttpRequest.new('https://www.sitepoint.com/').get.callback {|http|

  # extract_next_url is a fake method, you get the idea

  url = extract_next_url(http.response)
  EM::HttpRequest.new(url).get.callback {|http2|

    puts http2.response

  }

}

As you can imagine, this pattern gets messy fast. Goliath allows us to write the above code in the simple synchronous fashion we are familiar with:

http = EM::HttpRequest.new("http://www.sitepoint.com").get

# extract_next_url is a fake method, you get the idea

url = extract_next_url(http.response)

http2 = EM::HttpRequest.new(url).get

…yet behind the scenes it still executes asynchronously! Other code can be run while the HTTP requests are running.

This blows my mind. How does it work? Let’s find out.

Fibers

From the documentation, Goliath claims to works its magic by “leveraging Ruby fibers introduced in Ruby 1.9+”. This first hint sends us to the ruby rdocs to find:

Fibers are primitives for implementing light weight cooperative concurrency in Ruby. Basically they are a means of creating code blocks that can be paused and resumed, much like threads. The main difference is that they are never preempted and that the scheduling must be done by the programmer and not the VM.

Urgh, too many big words. Let’s just dive in and start poking around the Goliath code. The Goliath documentation contains a full example for proxying a site:

require 'goliath'
require 'em-synchrony'

require 'em-synchrony/em-http'
class HelloWorld < Goliath::API

  def response(env)

    req = EM::HttpRequest.new("http://www.google.com/").get

    resp = req.response
    [200, {}, resp]

  end

end
# to play along at home:

#   $ gem install goliath

#   $ gem install em-http-request --pre

#   $ ruby hello_world.rb -sv

We know that for this to occur in an asynchronous manner there must be some funny business going on in that #get call, so let’s try and find that. My spider sense tells me it will be somewhere in em-synchrony/em-http

…

$ gem unpack em-synchrony

Unpacked gem: '/Users/xavier/Code/tmp/em-synchrony-0.3.0.beta.1'

$ cd em-synchrony-0.3.0.beta.1

# I used tab completion on the next line to find the exact path

$ cat lib/em-synchrony/em-http.rb

That reveals:

# em-synchrony/lib/em-synchrony/em-http.rb

begin

  require "em-http"

rescue LoadError =< error

  raise "Missing EM-Synchrony dependency: gem install em-http-request"

end
module EventMachine

  module HTTPMethods

     %w[get head post delete put].each do |type|

       class_eval %[

         alias :a#{type} :#{type}

         def #{type}(options = {}, &blk)

           f = Fiber.current
            conn = setup_request(:#{type}, options, &blk)

            conn.callback { f.resume(conn) }

            conn.errback  { f.resume(conn) }
            Fiber.yield

         end

      ]

    end

  end

end

Jackpot! Fibers! It appears to be monkey-patching the existing em-http library, so before we go too much further let’s find out what normal em-http code looks like without fibers. There is a handy example on the em-http-request wiki:

EventMachine.run {

  http = EventMachine::HttpRequest.new('http://google.com/').get :query =< {'keyname' =< 'value'}
  http.errback { p 'Uh oh'; EM.stop }

  http.callback {

    p http.response_header.status

    p http.response_header

    p http.response
    EventMachine.stop

  }

}

It looks almost similar to the code above which is promising, and when we dig in a bit further it becomes even more so.

$ gem unpack em-http

ERROR:  While executing gem ... (Gem::RemoteFetcher::FetchError)

    SocketError: getaddrinfo: nodename nor servname provided, or not known (http://rubygems.org/latest_specs.4.8.gz)

# Oh noes it doesn't work!

# Search for em gems

$ gem list em- 
*** LOCAL GEMS ***
em-http-request (1.0.0.beta.2, 0.3.0)

em-socksify (0.1.0)

em-synchrony (0.3.0.beta.1)
$ gem unpack em-http-request # Ah that is probably it

$ cd em-http-request-1.0.0.beta.2

$ ack "get" lib/

lib/em-http/http_connection.rb

4:    def get    options = {}, &blk;  setup_request(:get,   options, &blk); end

Note on the last line that get defers straight to setup_request, which is the same call that is made in fiber example above. Yep, pretty much the same. Now we can head back to the fiber code.

f = Fiber.current

conn = setup_request(:#{type}, options, &blk)

conn.callback { f.resume(conn) }

conn.errback  { f.resume(conn) }

Fiber.yield

It appears what is happening is rather than immediately doing any work when a callback triggers, resume is called on the current fiber, presumably starting back up this thread at the point yield was called. Checking the documentation for Fiber.yield validates this, and also explains how the conn variable is returned from this method in the last sentence:

Yields control back to the context that resumed the fiber, passing along any arguments that were passed to it. The fiber will resume processing at this point when resume is called next. Any arguments passed to the next resume will be the value that this Fiber.yield expression evaluates to.

Using it

We now have an idea of how Goliath works it magic, though it may still be a fuzzy one. Let’s see if we have it right by trying to write some code that emulates it.

Remember that this fiber trick is simply a way of simplifying callback-littered code, so we should be able to first write a non-fiber-aware method and then clean it up. I like to start with a dirt simple example, so we are going to write a basic Goliath class that blocks for one second then renders some text.

class Surprise < Goliath::API

  def response(env)

    sleep 1

    [200, {}, "Surprise!"]

  end

end

Hit that in your web browser and bingo, it waits for a second. Not so fast though tiger, what happens when we issue multiple simultaneous requests:

$ ab -n 3 -c 3 127.0.0.1:9000/ | grep "Time taken"

Time taken for tests:   3.011 seconds

Alas, our webserver was only serving one request at a time. That’s not web scale. The sleep call not only blocks our response, but the entire server. That’s why we moved to evented programming in the first place. Let’s try a classic EventMachine timer instead:

class Surprise < Goliath::API

  def response(env)

    EventMachine.add_timer 1, proc {

      [200, {}, "Surprise!"]

    }

  end

end

Of course this does not work, because the #response

method needs to appear synchronous. What happens in this case is that the #add_timer returns nil and Goliath immediately tries to render that, exploding in the process. The timer triggers sometime later, and no code is still around to care. We cannot send the result of our timer proc as the return value for the method.

We need to combine the synchronous nature of the first example, with the asynchronous elements of the second; a beautiful frankenstein. Hopefully you have caught on that we can use fibers to do the stitching.

class Surprise < Goliath::API

  def response(env)

    f = Fiber.current

    EventMachine.add_timer 1, proc { f.resume }

    Fiber.yield
    [200, {}, "Surprise!"]

  end

end

We steal the pattern we saw in em-synchronicity/em-http above, grabbing the current fiber and setting up a resume call in the asynchronous callback which resumes execution over at the Fiber.yield. Testing this with ab, we see that this indeed solves our concurrency issue:

$ ab -n 3 -c 3 127.0.0.1:9000/ | grep "Time taken"

Time taken for tests:   1.009 seconds

These fiber things are pretty cool.

Wrapping Up

In exploring the Goliath source code and associated libraries we discovered how it pulls off its asynchronous-masquerading-as-synchronous trick, and were able to put that knowledge into practice with a simple example.

To practice your code reading, here are some other research tasks for you to try:

Find where Goliath calls into the #response method and see if there are any other lurking fiber tricks to be found.
Investigate one of the other libraries that em-synchrony provides an API for, such as em-mongo.
Rack-fiber_pool uses fibers in a similar context, check it out and see what it is getting up to.

Let us know how you go in the comments. Tune in next week for more exciting adventures in the code jungle.

Frequently Asked Questions (FAQs) about Concurrent Programming with Ruby Goliath

What is the difference between concurrency and parallelism in Ruby?

Concurrency and parallelism are two terms often used interchangeably, but they have different meanings. Concurrency in Ruby refers to the ability of a program to be in the progress of executing more than one task at the same time. However, it doesn’t necessarily mean that these tasks are all running at the exact same instant. For example, a program could start a task, then switch to a different task before the first one finishes. On the other hand, parallelism is when multiple tasks are executing at the same time. This is only possible on systems with multiple processors or cores.

How does Goliath handle concurrency?

Goliath is a non-blocking Ruby web server framework designed for handling concurrent connections. It uses an event-driven architecture to handle multiple connections concurrently, without the need for threads. This is achieved through the use of EventMachine and fibers, which allow Goliath to pause and resume processing as needed, without blocking the entire server.

What are the benefits of using Goliath for concurrent programming?

Goliath offers several benefits for concurrent programming. Firstly, it allows for efficient handling of multiple connections without the overhead of threads. This can lead to improved performance and scalability. Secondly, Goliath’s event-driven architecture allows for more responsive applications, as it can quickly switch between tasks as needed. Finally, Goliath is built on Ruby, which means you can take advantage of Ruby’s rich ecosystem and expressive syntax.

How does Goliath compare to other Ruby concurrency libraries?

Goliath is unique in its use of an event-driven architecture, which allows it to handle multiple connections concurrently without the need for threads. This can lead to improved performance and scalability compared to other Ruby concurrency libraries that rely on threads or processes. However, Goliath’s approach may require a different way of thinking about your code, as you need to structure your application to take advantage of its non-blocking nature.

Can I use Goliath with Rails or other Ruby frameworks?

Yes, Goliath can be used with Rails or other Ruby frameworks. However, keep in mind that Goliath’s event-driven architecture may require you to structure your application differently than you would with a traditional Rails application. You’ll need to ensure that your application is designed to take advantage of Goliath’s non-blocking nature.

How do I install and get started with Goliath?

Goliath is available as a gem, so you can install it using the gem install command. Once installed, you can create a new Goliath application using the goliath command. From there, you can start building your application by defining routes and handlers.

What are some common use cases for Goliath?

Goliath is particularly well-suited for applications that need to handle a large number of concurrent connections, such as chat servers, real-time analytics, or high-traffic APIs. Its event-driven architecture allows it to handle these connections efficiently, without the overhead of threads.

What is EventMachine and how does it work with Goliath?

EventMachine is a Ruby library for building event-driven applications. It provides a loop that can handle events such as incoming connections or data. Goliath uses EventMachine to handle its connections, allowing it to pause and resume processing as needed without blocking the entire server.

What are fibers and how does Goliath use them?

Fibers are a feature in Ruby that allow for lightweight, cooperative multitasking. They can be paused and resumed, allowing for non-blocking I/O operations. Goliath uses fibers to handle its connections, which allows it to handle multiple connections concurrently without the need for threads.

How can I debug a Goliath application?

Debugging a Goliath application can be done in a similar way to other Ruby applications. You can use tools like pry for interactive debugging, or puts statements for simple logging. However, keep in mind that because of Goliath’s event-driven nature, the order of execution may not be as straightforward as in a traditional Ruby application.

Understanding Concurrent Programming With Ruby’s Goliath