How many times have you developed a web application that had some functionality which would benefit from running an external program or even forking a separate process? This is not something you generally like to do from your web app because you want to make it run as fast and efficient as possible, while keeping the site functional for end users. So how do we get a fast but full-featured application that can process more than the average app we're used to?
Years ago developers of a web site were facing a similar problem. Users were uploading too many pictures, all of which had to be processed into their respective accounts. This included resizing functionality such as creating thumbnails or reducing the raw image size to something more appropriate for the web site to display. These pictures mostly consisted of avatars, millions of them overloading the app. The team thought of a solution: have a server that spawns processes that do things such as processing pictures, outside of the web server application. This resulted in Gearman.
What is Gearman?
Gearman provides a distributed application framework for work with multiple machines or processes. It allows applications to complete tasks in parallel, to load balance processing and to call functions between languages. The framework can be used in a variety of applications. Gearman is multi-threaded and is known to be able to carry out 50 thousand jobs per-second.
Some of the well-known sites using the C version of Gearman are:
– Digg: 45+ servers, 400 thousand jobs a day
– Yahoo: 120+ servers, 12 million jobs a day
Gearman was originally written in Perl, but the job servers and client API were recently rewritten in C by Eric Day because he wanted better performance.
The figure describes the type of setup you might use for image resizing. Traditionally, the image resizing would have been implemented completely within the web application. The user would upload an image, and within the HTTP request to serve the page, PHP would have to run the image conversion to perform the resizing. The page load would not be completed until the image resizing was complete. Now, with Gearman, the web app can request an image resizing by way of a Gearman client to the Gearman job server. Gearman allows you to separate some functionality from your web application letting other parts of your environment take care of it.
Installing and Running Gearman
Installing it is easy and straightforward if you just follow their comprehensive tutorial, assuming you have a high-level understanding of roles like a job server, clients, and workers; if you run into any problems while trying to install it, let us know in the comments below.
Gearman and PHP
Using Gearman with PHP makes for an ideal easy-to-use combination. You have a Gearman client, which is usually your application and is the code that sends jobs to the Gearman job server. You also have the worker component, which uses the PHP Gearman worker library to register itself as handling a named job and then specifies the function name for the job. The PHP extension extends the procedural interface to provide a native object oriented interface as well.
To install the Gearman extension into PHP, refer to your OS documentation. On Ubuntu and Linux Mint, for example, it's as easy as
sudo apt-get install php5-gearman. On some systems, however, you might have to manually build it and include it into your PHP extensions folder (you can find out where it is by looking at your
phpinfo()) and then include it in the
php.ini file like so:
To check if Gearman is successfully installed, see your
phpinfo() again or just run a test method:
var_dump( gearman_version() );
Let's look at an example:
$client= new GearmanClient();
$task= $client->addTask("reverse", "mydata", $data_array);
$task2= $client->addTaskLow("reverse", "task", NULL);
echo "CREATED: " . $task->jobHandle() . "\n";
echo "STATUS: " . $task->jobHandle() . " - " . $task->taskNumerator() .
"/" . $task->taskDenominator() . "\n";
echo "COMPLETE: " . $task->jobHandle() . ", " . $task->data() . "\n";
echo "FAILED: " . $task->jobHandle() . "\n";
echo "DATA: " . $task->data() . "\n";
if (! $client->runTasks())
return $client->error() ;
In this client example, a GearmanClient object is instantiated. Next, the Gearman client API
addServer() method is called to add a server to be used for the client connection. Multiple servers could be added if desired. An empty argument would default to
localhost – it was explicitly specified here for the sake of clarity.
Next, we set up some callbacks for various stages of the tasks – the concept of callbacks should be familiar to every intermediate developer.
addTask adds a task to be run in parallel with other tasks.
addTaskLow adds a low priority background task to be run in parallel with other tasks. To perform the work, we call
GearmanClient::runTasks(). Note that enough workers need to be available for the tasks to run in parallel. Tasks with a lower priority will be chosen from the queue after those of higher priority.
In this short introduction into Gearman, you learned about multitasking with PHP applications. You can now implement certain functionality external to your web application and achieve better performance while leaving little to none of the rest of your system idle.
In a future article, we'll cover a detailed real world use case of Gearman with a working demo. For now, please don't hesitate to leave a comment below if you'd like anything covered in more detail.