An Introduction to Celluloid, Part II

This is the second article in the three-part series. If you missed the first one, you can find it here

Celluloid has a ton more awesome tools to make concurrent programming incredibly easy in Ruby.

Let’s take a look at them.

Key Takeaways

Celluloid provides futures, allowing the return value of a method called on an actor to be used elsewhere, making concurrent programming easier. This can be especially useful when writing a library where the result of a function must be a future.
Celluloid offers a mechanism known as a supervisor for error handling. This allows for automatic restarting and tracking of actors when they crash, simplifying the process of threading code.
Communication between actors in Celluloid is facilitated through the actor Registry, where actors can be given names. It’s also possible to pass futures between actors to get return values. This, combined with the ability to make blocking calls inside actors and the use of pools to distribute workload, makes Celluloid a powerful tool for concurrent programming.

Futures

There are times when we don’t just want to discard the return value of a method we’ve called on an actor; instead, we might want to use it somewhere else. For that, Celluloid provides futures. The best way to learn about them is to see them in action.

We’ll write a small script that computes the SHA1 checksum of an array of files, then outputs them to the console.

Without further ado, here it is:

[gist id=”3169115″]

First of all, consider the checksum method. It is quite straightforward, we use the Digest::SHA1 to compute the checksum of the contents of a file that the actor is given.

Look at the files.each loop. This is where it gets interesting.

First, we create the actor and assign it a file. Then, instead of just calling the checksum method, we call it using a future. By doing this, a Celluloid::Future object is immediately returned, instead of blocking.

Then, we take this future object and pass it on to the output method inside the actor.
Inside the output method, the value of the checksum is needed! So, it is attained from the future object’s value method, which blocks until a value is available. That solves the problem!

You might be thinking, “hey, this does pretty much the same thing as the last example!” However, in the last example, in order to do the file related operations asynchronously, we dumped everything into a single method. With futures, we are able to cleanly seperate our code.

Also, there are use cases where it is only possible to use futures. For example, if one is writing a library, the result of the checksum function must be a future since the user of the library should be able to add in their own code.

Making Any Block Concurrent

There is a very cool use for futures, namely, they allow us to push block of code to another thread incredibly easily.

Check it out:

[gist id=”3169318″]

We use Celluloid::Future to push a block into its own thread. Celluloid manages everything about that thread, whose return value we can use later on (using the future’s return value, of course). So, this little part of Celluloid can be plugged into literally any application and once mastered, can be incredibly useful.

Use it wisely!

Catching Errors – Supervisors

To see how error handling works in Celluloid, we’re going to build a simple tool that gets the HTML of various websites.

Here it is, with the stuff we’ve learned so far:

[gist id=”3169206″]

If everything goes well, the markup is putsd.

But, what if things start going wrong? We’re not really doing much about that.

For that purpose, Celluloid provides a mechanism known as a supervisor. Here it is in action:

[gist id=”3169282″]

There’s several new concepts here, so pay close attention.

First of all, the MarkupPutter class is left untouched. In other words, the implementation of the business logic is left unchanged!

Now, we call the supervise method on the MarkupPutter class. This does three things, first, it creates (and puts into motion) an actor that is an instance of MarkupPutter. Secondly, it returns a supervisor object, which can do some interesting things. Finally, it takes its first argument (which is “mp”), and puts an entry of that name in the registry.

The Celluloid registry is a bit like a phonebook – the actors that are in there can be accessed by name. So, on the next line, we use the Celluloid registry to look up :mp.

The code after that is quite straightforward – simply using a future to output the markup.

With two lines of code added, Celluloid automatically takes care of restarting and keeping track of actors when they crash!

In case one of the actors hits some kind of exception (e.g. the website does not respond to the request and the request times out), the actor is immediately restarted by the Celluloid core. If you’ve written this kind of threading code the old fashioned way, you know that this is a very finicky and difficult process, but it is handled entirely by Celluloid for us!

Communication Between Actors

In nearly all applications, actors will not be working in isolated environments – they will be communicating with other actors.

Just to explain how communication between actors works in Celluloid, we’ll write three actors to print out “Hello, world!” when run correctly. Check it out:

[gist id=”3169390″]

We start out by defining three actors, which each say a part of the “Hello, world!n” message. HelloSpaceActor uses the registry to look up the WorldActor instance and calls say_msg on it, then, WorldActor does the same for NewlineActor.

So, long story short, the actor communication is done with the actor Registry, where we are able to give the actors names.

As we know, another way to make actors work together is futures – have futures passed around between actors in order to get return values.

Blocking Calls Inside Actors

If you have experience with EventMachine, you know that you can’t mix EventMachine with any other library for IO – the library needs to be EventMachine compatible.

As such, you aren’t able to utilize the full power of the Ruby community. Instead, you are stuck with the far smaller EventMachine community.
With Celluloid, this isn’t the case!

Since the actors are all in their own threads, it is perfectly okay for method calls inside actors to block, since it only blocks that one actor!

But, beware. Do not make infinitely blocking calls in actors (such as listening on a socket) – this leads to all messages going to that actor to beome paused, which is bad!

Pooling

If you have read up a bit about how web servers operate, you know how important thread pools are. Pools in Celluloid are awesome; they are completely transparent. I think they are probably my favorite feature of Celluloid (with so much cool stuff, its hard to choose!).

We’ll write a simple example to demonstrate how amazing they are:

[gist id=”3169620″]

First, we define the PrimeWorker class. The “Worker” in the name signifies that it is to be used with a pool – threads that are part of thread pools are usually called workers.

The function of the prime method in PrimeWorker is to print a number if it is prime (this uses the ‘mathn’ module introduced in 1.9 – you can write your own prime number checker if you like).

The interesting part is when we introduce the pool by calling the pool method on PrimeWorker.

The “pool” object has all the methods of PrimeWorker, but, it actually creates as many instances of PrimeWorker as the processor has cores. Therefore, if you have a quad core processor, that would create four actors. When methods are called on “pool”, Celluloid decides which actor out of the pool to invoke.

Following that, we have a map over a large range, in which we call prime (remember, it is called asynchronously because of the bang) on pool. This automatically distributes the workload over your processors!

Wow. It took maybe four or five lines of code extra to acheive complete concurrency. That’s amazing.

At the end of the program, there is a sleep call. There is a good reason for this. Since we are calling prime asynchronously, the main thread (which is the Ruby thread) exits when it is done telling all the actors “hey, remember to print out this prime”. However, the actors aren’t done actually printing the primes by the time the main thread exits, so the output never reaches the terminal.

But, the sleep command keeps the main thread alive for long enough so that all the output comes out correctly. Also notice that since we are calling prime asynchronously, there is no gurantee of the order of the primes that are outputted.

Wrapping It Up

I hope you enjoyed the article, and that you’re as excited about Celluloid as I am.

So far, we’ve discussed how to use the various parts of Celluloid are to be used seperately with small examples.

In Part 3, we’ll cover how all of this ties together, create some more complex programs, and cover more features, such as Linking.

Do ask any questions you have in the comments section below :)