An Introduction to Celluloid, Part I

Key Takeaways

Celluloid is a concurrent object-oriented programming framework in Ruby designed to simplify the creation of multithreaded programs. It abstracts complexities of concurrent programming, such as thread management and synchronization, into a simple, easy-to-use API.
Unlike other concurrency libraries, Celluloid uses an actor-based model for managing concurrency. Objects, or actors, run concurrently in their own threads and communicate through asynchronous messages, reducing complexities associated with threads and locks.
In addition to its unique actor-based model, Celluloid provides features for fault-tolerance, including supervision trees and linking, which are not commonly found in other concurrency libraries.
Celluloid also offers built-in deadlock protection and automatic handling and restarting of crashed actors, making it easier to write concurrent applications. However, it’s important to note that asynchronous message delivering isn’t perfect and may lead to issues such as undelivered messages or non-responsive actors.

A few years ago, there used to be a very easy way to optimize code. If you found out that your processor-heavy code was running a bit slower than you wanted it to, the simple solution was just to wait for the next hardware iteration, which would magically amp up the clock rate on the CPUs and your app would suddenly run faster.

Unfortunately, those times are now past. Because of a little something called “transition energy” inside the logic gates (the incredibly tiny electronically controlled “switches” that run processors), it has become close to impossible to push the clock rate any further at a reasonable price. But, a smart solution was quickly uncovered. Instead of trying to push one processor to run at break neck speed, we instead spread out the workload over multiple processors. What an excellent idea!

For people that write software, understanding concurrency and using it well suddenly becomes incredibly important. In order to scale up with the hardware, your code MUST use concurrency! This seemed simple enough to everyone at first; it couldn’t be that hard right? Well, it was. The mechanism that is (arguably) most popular for concurrency is threads, which are immensely complicated. A few small mistakes here and there can wreak havoc. If you’re really interested in this sort of stuff, I’d recommend you read up a little.

If you were brave enough to click a a few of those links, it shouldn’t take long to realize that thread management isn’t always a walk in the park. Consider, as an example, deadlocking.

(dead) Locking

Suppose you have two people that have a single copy of a book that they both want to read. Assuming that they won’t both be able to read the same book at the same time, so one person will have to wait until the other is done reading it. Now, consider a situation where each person thinks that the other person isn’t done reading the book and, as such, decides not to start reading it. Now, neither person reads the book.

This situation is known as a deadlock.

If your program has two “threads” (which are analogous to the two people we discussed) and they both write to a file called “steve”. Both of them cannot write to the file at the same time, so the file is “locked” while one thread is writing to it. A similar situation can be present for shared variables.

Of course, this situation works just great, but the problem occurs when there is some error that causes the file to be locked for both threads. If you’ve written any amount of thread driven code, you know this happens a lot. When it does, it is a complete pain to track down and fix.

Race Conditions

In deadlocking, the two threads wait around for each other forever. However, deadlocking has a cousin that is just as bad, if not worse.

Race conditions occur when two threads try to access and write to a variable at the same time. So, both threads read the variable, then each one races to see who can write to the variable first/last.

This causes all kinds of problems, because it might cause one thread’s changes to the variable to be completely hammered by the other.

Solution?

The two problems mentioned above are just the tip of the iceberg – there’s all kinds of other issues to be dealt with when using threads.

Around the 1980’s, people began thinking about where all of these problems were really coming from and how to deal with them. They found that nearly all of these issues are caused by sharing of state (i.e. variables, files, etc.) and locking. If you forget to lock a single shared resource, there’s going to be a boatload of trouble waiting for you with threads.

Multiple solutions were proposed, one of which is evented I/O, which means eventmachine for Rubyists.

Along with that, some bright academics came up with a new model that had taken some ideas from quantum physics. Instead of sharing state, the entire concurrency system would be based on passing messages. They called this the actor model.

Actor Model

In the actor model, each object is an actor. Each actor is meant to send and receive messages to other actors and may also create other actors if need be. The main point around which all of this pivots is the fact that all communications can be asynchronous and no state is shared between the actors. That means that messages can be being sent while others are being received. Also, when an actor sends a message it doesn’t always have to wait for a response. State that is to be shared between processes is done entirely through means of messaging.

If you didn’t understand a large portion of this, that’s perfectly fine, just follow along.

Of course, none of this happens by itself. A library called (Celluloid)[http://celluloid.io/] makes this stuff happen and that’s what we’ll be using!

Celluloid

Celluloid brings the actor model to Ruby, making writing concurrent applications incredibly easy.

First of all, Celluloid comes with built in deadlock protection, because all of the messaging between actors between is handled in a such a way that deadlocking is darn near impossible, as long as you’re not doing something crazy or messing with native (i.e. C) code.

If you’re familiar with Erlang (it is okay if you’re not), Celluloid borrows one of its most important ideas: fault tolerance. Celluloid automatically restarts and handles crashed actors, so you don’t have to worry about every last thing that could go wrong.

There’s all kinds of other features (linking, futures, etc.) that make threading a breeze, but there a few things to keep in mind.

The GIL

The “regular” or vanilla ruby that we’re all used to is backed by either MRI or YARV, which are different types of interpreters/vitual machines for Ruby.

Now, there is, debatably, a problem with this interpreter. The thing is, all threads inside MRI/YARV aren’t really concurrent – everything is run under a single thread. This is called the Global Interpreter Lock. Ruby isn’t the only language with an interpreter that has this – so does Python and don’t even get me started on threading with PHP.

When a new thread is created, the result is the computation that is performed isn’t actually performed at the same time as other stuff is being done. An illusion is created that makes the user think that this is what is happening.

Fortunately, there is a solution. Just use a different interpreter! Take your pick:

(JRuby)[jruby.org]
(Rubini.us)[http://rubini.us/]

Note that if you do choose to go with one of the above interpreters, make sure you are operating in 1.9 mode in order to be compaitible with Celluloid.

Diving In

Let’s get started with Celluloid by writing a small actor – let’s make it read a file we tell it to, and then display the results when they are wanted.

[gist id=”3155240″]

Running that (with your choice of interpreter), you should get a dump of the kernel log (of course, assuming you’re using a POSIX system, Windows users can replace that line with any file of their choice) followed by a message from Celluloid telling you that two actors have been terminated.

Alright, so, what just happened?

We defined the FilePutter actor and created an instance of it, which Celluloid automatically pushes into its own thread! There’s no difference in how we are calling the methods; it is just like we would for a regular class, and it dumps the contents of a file.

First, calling load_file loads the file, then we proceed on to printing the contents. Not too complicated.

But, one thread isn’t all that interesting; how about five? Easy enough:

[gist id=”3155267″]

And, just like that, we’ve created five threads which each read the files in the “files” array.

But, all of the methods we’ve called so far have been called syncronously, meaning that we have to wait for them to end before proceeding. What if we just pushed them to the side and moved on?

This is where Celluloid really shines:

[gist id=”3155297″]

This is where it gets interesting. First of all, we combined the loading of the file and printing the contents into one method, namely load_file_and_print. Then, notice inside the loop over the files array, we don’t call load_file_and_print, instead, we call load_file_and_print! (i.e. with a bang).

Wrapping It Up

When we call the given method with the bang, Celluloid runs that call asynchronously, allowing our program to move right along without waiting for the file to load or the printing to occur.

As we know, methods with an exclamation mark at the end are usually noted as “dangerous” in Ruby. As an example:

[gist id=”3155327″]

That changes the value of “a” itself, which could lead to problems.

The same case is present with bang methods in Celluloid; the asynchronous message delivering isn’t perfect. The message might not be delivered, the actor might not respond, etc.

But, how do you figure out when this happens?

Also, threads are never really completely independent of each other – how do you have them talking to each other?

This, and, several other important actor model features and niceties are coming in part II, so, stay tuned!

If you felt this article went a bit too slow, you’ll be satisfied with Part II :)

Please ask questions if you have any in the comments section.

Frequently Asked Questions (FAQs) about Celluloid

What is the main purpose of Celluloid?

Celluloid is a concurrent object-oriented programming framework in Ruby. It is designed to help developers build multithreaded programs more easily and efficiently. It provides a simple and natural way to build fault-tolerant concurrent programs in Ruby, allowing you to create systems that can handle multiple tasks at the same time, improving performance and responsiveness. It does this by abstracting some of the complexities of concurrent programming, such as thread management and synchronization, into a simple, easy-to-use API.

How does Celluloid differ from other concurrency libraries?

Unlike other concurrency libraries, Celluloid provides a unique actor-based model for managing concurrency. This model allows objects (actors) to run concurrently in their own threads, and communicate with each other through asynchronous messages. This approach simplifies the process of writing concurrent code, as it abstracts away many of the complexities associated with threads and locks. Furthermore, Celluloid also provides features for fault-tolerance, such as supervision trees and linking, which are not commonly found in other concurrency libraries.

How do I install Celluloid?

To install Celluloid, you need to have Ruby installed on your system. Once you have Ruby, you can install Celluloid by running the following command in your terminal: gem install celluloid. This will download and install the latest version of Celluloid from RubyGems, the Ruby community’s gem hosting service.

How do I create an actor in Celluloid?

In Celluloid, you create an actor by defining a class that includes the Celluloid module. Here’s a simple example:

class MyActor
include Celluloid

def perform_task
# Task implementation goes here
end
end

You can then create an instance of this class, which will automatically be an actor running in its own thread.

How do actors communicate in Celluloid?

Actors in Celluloid communicate with each other through asynchronous messages. When an actor wants to send a message to another actor, it uses the async method, followed by the method name and any arguments. For example, actor.async.perform_task. The message will be added to the recipient actor’s mailbox and processed in the order it was received.

What is a supervision tree in Celluloid?

A supervision tree is a mechanism in Celluloid for managing the lifecycle of actors and handling failures. It is a hierarchical structure where each node is an actor, and the parent node is responsible for starting, monitoring, and restarting its child nodes in case of failures. This provides a robust way to build fault-tolerant systems.

How do I handle exceptions in Celluloid?

In Celluloid, exceptions are handled in a unique way. When an exception occurs in an actor, it doesn’t crash the actor immediately. Instead, the actor goes into a “crashed” state, and the exception is stored. The next time any method is called on the actor, the stored exception is re-raised. This allows the system to handle failures in a controlled manner.

Can I use Celluloid with Rails?

Yes, you can use Celluloid with Rails. However, it’s important to note that Rails is not thread-safe by default, so you need to ensure that your Rails application is configured correctly to handle concurrency. You can do this by enabling thread-safe mode in your Rails configuration.

What are futures in Celluloid?

Futures are a feature in Celluloid that allows you to perform computations in the background and retrieve the result later. When you call a method on an actor with the future method, it returns a Future object immediately, and the method is executed in the background. You can then call value on the Future object to retrieve the result when it’s ready.

Is Celluloid still maintained?

As of now, the Celluloid project is not actively maintained. The last release was in 2016. However, the library is still widely used and the community continues to provide support for it. If you’re starting a new project, you might want to consider other options for concurrency in Ruby, such as concurrent-ruby or async.