An Introduction to Celluloid, Part II

Dhaivat Pandya
This entry is part 2 of 3 in the series An Introduction to Celluloid

An Introduction to Celluloid

This is the second article in the three-part series. If you missed the first one, you can find it here

Celluloid has a ton more awesome tools to make concurrent programming incredibly easy in Ruby.

Let’s take a look at them.

Futures

There are times when we don’t just want to discard the return value of a method we’ve called on an actor; instead, we might want to use it somewhere else. For that, Celluloid provides futures. The best way to learn about them is to see them in action.

We’ll write a small script that computes the SHA1 checksum of an array of files, then outputs them to the console.

Without further ado, here it is:

First of all, consider the checksum method. It is quite straightforward, we use the Digest::SHA1 to compute the checksum of the contents of a file that the actor is given.

Look at the files.each loop. This is where it gets interesting.

First, we create the actor and assign it a file. Then, instead of just calling the checksum method, we call it using a future. By doing this, a Celluloid::Future object is immediately returned, instead of blocking.

Then, we take this future object and pass it on to the output method inside the actor.
Inside the output method, the value of the checksum is needed! So, it is attained from the future object’s value method, which blocks until a value is available. That solves the problem!

You might be thinking, “hey, this does pretty much the same thing as the last example!” However, in the last example, in order to do the file related operations asynchronously, we dumped everything into a single method. With futures, we are able to cleanly seperate our code.

Also, there are use cases where it is only possible to use futures. For example, if one is writing a library, the result of the checksum function must be a future since the user of the library should be able to add in their own code.

Making Any Block Concurrent

There is a very cool use for futures, namely, they allow us to push block of code to another thread incredibly easily.

Check it out:

We use Celluloid::Future to push a block into its own thread. Celluloid manages everything about that thread, whose return value we can use later on (using the future’s return value, of course). So, this little part of Celluloid can be plugged into literally any application and once mastered, can be incredibly useful.

Use it wisely!

Catching Errors – Supervisors

To see how error handling works in Celluloid, we’re going to build a simple tool that gets the HTML of various websites.

Here it is, with the stuff we’ve learned so far:

If everything goes well, the markup is putsd.

But, what if things start going wrong? We’re not really doing much about that.

For that purpose, Celluloid provides a mechanism known as a supervisor. Here it is in action:

There’s several new concepts here, so pay close attention.

First of all, the MarkupPutter class is left untouched. In other words, the implementation of the business logic is left unchanged!

Now, we call the supervise method on the MarkupPutter class. This does three things, first, it creates (and puts into motion) an actor that is an instance of MarkupPutter. Secondly, it returns a supervisor object, which can do some interesting things. Finally, it takes its first argument (which is “mp”), and puts an entry of that name in the registry.

The Celluloid registry is a bit like a phonebook – the actors that are in there can be accessed by name. So, on the next line, we use the Celluloid registry to look up :mp.

The code after that is quite straightforward – simply using a future to output the markup.

With two lines of code added, Celluloid automatically takes care of restarting and keeping track of actors when they crash!

In case one of the actors hits some kind of exception (e.g. the website does not respond to the request and the request times out), the actor is immediately restarted by the Celluloid core. If you’ve written this kind of threading code the old fashioned way, you know that this is a very finicky and difficult process, but it is handled entirely by Celluloid for us!

Communication Between Actors

In nearly all applications, actors will not be working in isolated environments – they will be communicating with other actors.

Just to explain how communication between actors works in Celluloid, we’ll write three actors to print out “Hello, world!” when run correctly. Check it out:

We start out by defining three actors, which each say a part of the “Hello, world!n” message. HelloSpaceActor uses the registry to look up the WorldActor instance and calls say_msg on it, then, WorldActor does the same for NewlineActor.

So, long story short, the actor communication is done with the actor Registry, where we are able to give the actors names.

As we know, another way to make actors work together is futures – have futures passed around between actors in order to get return values.

Blocking Calls Inside Actors

If you have experience with EventMachine, you know that you can’t mix EventMachine with any other library for IO – the library needs to be EventMachine compatible.

As such, you aren’t able to utilize the full power of the Ruby community. Instead, you are stuck with the far smaller EventMachine community.
With Celluloid, this isn’t the case!

Since the actors are all in their own threads, it is perfectly okay for method calls inside actors to block, since it only blocks that one actor!

But, beware. Do not make infinitely blocking calls in actors (such as listening on a socket) – this leads to all messages going to that actor to beome paused, which is bad!

Pooling

If you have read up a bit about how web servers operate, you know how important thread pools are. Pools in Celluloid are awesome; they are completely transparent. I think they are probably my favorite feature of Celluloid (with so much cool stuff, its hard to choose!).

We’ll write a simple example to demonstrate how amazing they are:

First, we define the PrimeWorker class. The “Worker” in the name signifies that it is to be used with a pool – threads that are part of thread pools are usually called workers.

The function of the prime method in PrimeWorker is to print a number if it is prime (this uses the ‘mathn’ module introduced in 1.9 – you can write your own prime number checker if you like).

The interesting part is when we introduce the pool by calling the pool method on PrimeWorker.

The “pool” object has all the methods of PrimeWorker, but, it actually creates as many instances of PrimeWorker as the processor has cores. Therefore, if you have a quad core processor, that would create four actors. When methods are called on “pool”, Celluloid decides which actor out of the pool to invoke.

Following that, we have a map over a large range, in which we call prime (remember, it is called asynchronously because of the bang) on pool. This automatically distributes the workload over your processors!

Wow. It took maybe four or five lines of code extra to acheive complete concurrency. That’s amazing.

At the end of the program, there is a sleep call. There is a good reason for this. Since we are calling prime asynchronously, the main thread (which is the Ruby thread) exits when it is done telling all the actors “hey, remember to print out this prime”. However, the actors aren’t done actually printing the primes by the time the main thread exits, so the output never reaches the terminal.

But, the sleep command keeps the main thread alive for long enough so that all the output comes out correctly. Also notice that since we are calling prime asynchronously, there is no gurantee of the order of the primes that are outputted.

Wrapping It Up

I hope you enjoyed the article, and that you’re as excited about Celluloid as I am.

So far, we’ve discussed how to use the various parts of Celluloid are to be used seperately with small examples.

In Part 3, we’ll cover how all of this ties together, create some more complex programs, and cover more features, such as Linking.

Do ask any questions you have in the comments section below :)

An Introduction to Celluloid

<< An Introduction to Celluloid, Part IAn Introduction to Celluloid, Part III >>

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • Chris

    I’ve dome some benchies using the pooling example (i’ve changed the number range to (2..500000)) and it takes about 21 seconds when celluloid is completely removed from the code, ie no threads/actors. When celluloid is added and the pooling feature is used it takes a cripplingly long time, too long to even wait for the benchmark to complete.

    • Dhaivat Pandya

      The square operation is too “simple” (not computationally intensive) for it to be worthwhile using the actor pool.

  • Esteban

    Isn’t there a way to tell the program to wait for the actors before finishing?
    Something like Thread#join ?

    • Dhaivat Pandya

      Hi,

      You can use supervision groups – those will most likely be covered in Part 3.

  • http://bibwild.wordpress.com Jonathan Rochkind

    Super useful, thanks! I’m going to have to read it a couple more times to fully understand what’s going on.

    But one question about supervise. In your example you’re doing supervise_as inside the loop.

    websites.each do |website|
    supervisor = MarkupPutter.supervise_as :mp, website

    But you say ” puts an entry of that name in the registry.” So wait, aren’t you four times (once for each website) putting an actor in the registry as :mp? Over-writing the previous one each time? That doesn’t seem right?

    Worse, if you don’t over-write, might you possibly end up using the _same_ actor each time in the loop instead of creating a new one like last time, since you’re looking it up with ` Celluloid::Actor[:mp]` each time. If you re-use the same actor 4 times in the loop, you’re going to lose the whole point here — which is doing each HTTP request in a seperate thread. If it’s only one Actor, then those :get_markup calls, even with the future, are going to stack up and be processed serially by a single actor, instead of occuring concurrently in their own threads. IF it’s only one actor involved. But if it’s actually multiple actors, but you’re over-writing them in the Actor registry…. then I’m not sure the old ones that have been overwritten are really supervised?

    What’s up with that?

    • Dhaivat Pandya

      The question you bring up is important.

      The registry doesn’t have anything to do with supervision – it is for the programmer’s use, not for Celluloid. So, by overwriting the registry, we are not actually doing anything to the actors.

  • http://bibwild.wordpress.com Jonathan Rochkind

    okay, cool. Is there a way to supervise without using the registry then? In your example wit supervise_as, the registry seems to have something to do with supervision, yeah? In your example, the only way to get the actual actor you just created (to be supervised) is out of the registry, right?

    And I’m right that you’re over-writing the registry here, right? In this example, if it’s the whole of your application, you get the thing out of the registry right after you put it in before you over-write it. But in a more complicated app if this code were executed in another thread (say, another actor) — it’s possible someone else would over-write the registery after you supervise_as, and before you pull it out of the registry, and you’d wind up with the wrong actor (say, one already busy doing something else instead of the fresh new one you wanted to get).

    So a way to supervise without using the registry seems to be important, there is one?

  • http://bibwild.wordpress.com Jonathan Rochkind

    Ah, I see there is in the docs. https://github.com/celluloid/celluloid/wiki/Supervisors

    In fact, for your particular example… I don’t think supervision gets you anything. It’ll restart the actor if it’s crashed…. but you never re-use actors in that example, you just call one method on each actor. If the method raises an exception, the supervisor will re-create the killed actor… but it won’t re-execute the method or anything. And that re-created killed actor will never be referenced by anyone else again (in fact, it’s lost, nobody even has a ref to it, since only the ‘last’ actor was registered), there was no reason to re-create it. The supervision doesn’t actually help you deal with things going wrong in your particular example. Am I missing anything?

    • Dhaivat Pandya

      The supervision ties together in Part 3 :)

  • Jabari

    You did:

    (2..1000).to_a.map do |i|
    pool.prime! i
    end

    Simpler and faster:

    (2..1000).each {|i| pool.prime! i }

  • Jabari

    Also, for more idiomatic Ruby code:

    This

    def prime(number)
    if number.prime?
    puts number
    end
    end

    is more idiomatic Ruby as this:

    def prime(number)
    puts number if number.prime?
    end

    or this as oneliner:

    def prime(number); puts number if number.prime? end