Threads in Ruby

Web Developer, Ruby on Rails and JavaScript enthusiast

Threads

Ruby has many cool features which attract developers, such as the ability to create classes at runtime, alter behavior of any particular object, monitor the number of classes in memory using ObjectSpace, and an extensive list of test-suites. All these things make a developer’s life easier. Today we will discuss one of the most fundamental concepts in Computer Science: Threads and how Ruby supports them.

Introduction

First of all let’s define “thread”. According to Wikipedia

In computer science, a thread of execution is the smallest sequence of programmed instructions that can be managed independently by an operating system scheduler. A thread is a light-weight process.

A thread is a light-weight process. Threads that belong to the same process share that process’s resources. This is why it is more economical to have threads in some cases.

Let’s see how threads can be useful to us.

Basic Example

Consider the following code

def calculate_sum(arr)
  sum = 0
  arr.each do |item|
    sum += item
  end
  sum
end

@items1 = [12, 34, 55]
@items2 = [45, 90, 2]
@items3 = [99, 22, 31]

puts "items1 = #{calculate_sum(@items1)}"
puts "items2 = #{calculate_sum(@items2)}"
puts "items3 = #{calculate_sum(@items3)}"

The output of the above program would be

items1 = 101
items2 = 137
items3 = 152

This is a very simple program that will help in understanding why we should use threads. In the above code listing, we have 3 arrays and are calculating their sum. All of this is pretty straightforward stuff. However, there is a problem. We cannot get the sum of the items2 array unless we have received the items1 result. It’s the same issue for items3. Let’s change the code a bit to show what I mean.

def calculate_sum(arr)
  sleep(2)
  sum = 0
  arr.each do |item|
    sum += item
  end
  sum
end

In the above code listing we have added a sleep(2) instruction which will pause execution for 2 seconds and then continue. This means items1 will get a sum after 2 seconds, items2 after 4 seconds (2 for items1 + 2 seconds for items2) and items3 will get sum after 6 seconds. This is not what we want.

Our items arrays don’t depend upon each other, so it would be ideal to have their sums calculated independently. This is where threads come in handy.

Threads allow us to move different parts of our program into different execution contexts which can execute independently. Let’s write a threaded/multithreaded version of the above program:

def calculate_sum(arr)
  sleep(2)
  sum = 0
  arr.each do |item|
    sum += item
  end
  sum
end

@items1 = [12, 34, 55]
@items2 = [45, 90, 2]
@items3 = [99, 22, 31]

threads = (1..3).map do |i|
  Thread.new(i) do |i|
    items = instance_variable_get("@items#{i}")
    puts "items#{i} = #{calculate_sum(items)}"
  end
end
threads.each {|t| t.join}

The calculate_sum method is the same as our previous code sample where we added sleep(2). Our items arrays are the same too. The most important change is the way we have called calculate_sum on each array. We wrapped the calculate_sum call corresponding to each array in a Thread.new block. This is how to create threads in Ruby.

We have done a bit of metaprogramming to get each items array according to the index i in the loop. At the end of the program, we ask threads to process the blocks that we gave them.

If you run the above code sample, you might see the following output (I use might because your output might be different in terms of items arrays sum sequence)

items2 = 137
items3 = 152
items1 = 101

Instead of getting a response for items2 array after 4 seconds and items3 array after 6 seconds, we received the sum of all arrays after 2 seconds. This is great and shows us the power of threads. Instead of calculating the sum of one array at a time, we are calculating sum of all arrays at once or concurrently. This is cool because we have saved 4 seconds which is definitely an indication of better performance and efficiency.

Race Conditions

Every feature comes with a price. Threads are good, but if you are writing multithreaded application code then you should be aware of handling race conditions. What is a race condition? According to Wikipedia

Race conditions arise in software when separate computer processes or threads of execution depend on some shared state. Operations upon shared states are critical sections that must be mutually exclusive. Failure to obey this rule opens up the possibility of corrupting the shared state.

In simple words, if we have some shared data that can be accessed by multiple threads then our data should be OK (meaning, not corrupt) after all threads finish execution.

Example

class Item
  class << self; attr_accessor :price end
  @price = 0
end

(1..10).each { Item.price += 10 }

puts "Item.price = #{Item.price}"

We have created a simple Item class with a class variable price. Item.price is incremented in a loop. Run this program and you will see following output

Item.price = 100

Now let’s see a multithreaded version of this code

class Item
  class << self; attr_accessor :price end
  @price = 0
end

threads = (1..10).map do |i|
  Thread.new(i) do |i|
    item_price = Item.price # Reading value
    sleep(rand(0..2))
    item_price += 10        # Updating value
    sleep(rand(0..2))
    Item.price = item_price # Writing value
  end
end

threads.each {|t| t.join}

puts "Item.price = #{Item.price}"

Our Item class is the same. However, we have changed the way we are incrementing the value of price. We have deliberately used sleep in the above code to show you possible problems that might occur from concurrency. Run this program multiple times and you will observe two things.

Item.price = 40

First the output is incorrect and inconsistent. Output is not 100 anymore, and sometimes you might see 30 or 40 or 70, etc. This is what a race condition does. Our data is no longer correct and is corrupted each time we run our program.

Mutual Exclusion

To fix race conditions, we have to control the program so that when one thread is doing work another should wait unitl the working thread finishes. This is called Mutual Exclusion and we use this concept to remove race conditions in our programs.

Ruby provides a very neat and elegant way for mutual exclusion. Observe:

class Item
  class << self; attr_accessor :price end
  @price = 0
end

mutex = Mutex.new

threads = (1..10).map do |i|
  Thread.new(i) do |i|
    mutex.synchronize do 
      item_price = Item.price # Reading value
      sleep(rand(0..2))
      item_price += 10        # Updating value
      sleep(rand(0..2))
      Item.price = item_price # Writing value
    end
  end
end

threads.each {|t| t.join}

puts "Item.price = #{Item.price}"

Now run this program and you will following output

Item.price = 100

This is because of mutex.synchronize. One and only one thread can access the block wrapped in mutex.synchronize at any time. Other threads have to wait until the current thread that is processing completes.

We have made our code threadsafe.

Rails is threadsafe and uses Mutex class’s instance to avoid race conditions when multiple threads try to access same code. Look at the following code from the Rack::Lock middleware. You will see that @mutex.lock is used to block other threads that try to access same code. For in-depth detail about multithreading in Rails, read my article. Also, you can visit the Ruby Mutex class page for reference on Mutex class.

Types of Threads in Different Ruby Versions

In Ruby 1.8, there were “green” threads. Green threads were implemented and controlled by the interpreter. Here are some pros and cons of green threads:

Pros

  • Cross platform (managed by the VM)
  • Unified behavior / control
  • Lightweight -> faster, smaller memory footprint

Cons

  • Not optimized
  • Limited to 1 CPU
  • A blocking I/O blocks all threads

As of Ruby 1.9, Ruby uses native threads. Native threads means that each thread created by Ruby is directly mapped to a thread generated at the Operating System level. Every modern programming language implements native threads, so it makes more sense to use native threads. Here are some pros of native threads:

Pros

  • Run on multiple processors
  • Scheduled by the OS
  • Blocking I/O operations don’t block other threads.

Even though have native threads in Ruby 1.9, only one thread will be executing at any given time, even if we have multiple cores in our processor. This is because of the GIL (Global Interpreter Lock) or GVL (Global VM Lock) that MRI Ruby (JRuby and Rubinius do not have a GIL, and, as such, have “real” threads) uses. This prevents other threads from being executed if one thread is already being executed by Ruby. But Ruby is smart enough to switch control to other waiting threads if one thread is waiting for some I/O operation to complete.

Working with threads is quite easy in Ruby, but we have to be careful about various pitfalls and concurrency problems. I hope you enjoyed this article and can apply threading to your Ruby going forward.

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • Anonymous

    Nice writeup!

  • Anonymous

    Great summary, and I’m glad you included a warning about race conditions. Good also to know that jRuby and Rubinius aren’t affected by GIL. One nit pick however: you alternating between using the function name `calculate_sum` and the (incorrectly spelled) name `calcualte_sum` in the introduction.

  • max

    Thanks you. This is very easy article for understanding.

  • Nate

    Great intro for folks new to threads in Ruby. I’m pretty sure green threads don’t block all other threads when performing I/O. If they did, there would be no benefit to using them. If fact, you would only incur a context switching penalty. See http://yehudakatz.com/2010/08/14/threads-in-ruby-enough-already/ for more info.

  • Kannan

    This is very useful article for understanding about thread. Thanks.

  • @ahsan_s

    Not sure I understood why only one thread can run at a time in certain situations. Doesn’t that defeat the purpose of having threads in the first place?

    • tra

      To understand it search for the difference between concurrency and parallelism. The latter means doing thing faster by running parts of task in parallel – for example on 2 core CPU you could encode file 2 times faster (in theory). Concurrency, on the other hand, doesn’t really means at the same time. On 1 core CPU you still has many programs running. You don’t wanna music stopping when user move the mouse. Hence the threads.

  • Anonymous

    Really, what a nice article. Well explained and extremely well exemplified. Thanks for sharing !