Upgrading from Pthreads v2 to v3: What to Look out For
- Design PatternsDevelopment EnvironmentExtensionsMiscellaneousPerformancePerformance & ScalingScaling
A fair amount has changed for the pthreads extension with the release of pthreads v3. This article aims to cover the necessary information for those who are looking to upgrade their applications from pthreads v2 to v3.
If you’re unfamiliar with pthreads, check out my introduction to pthreads instead!
A big thank you to Joe Watkins for proofreading and helping to improve my article!
Generic Changes
There have been a few general changes made in pthreads v3. The first, and perhaps most prominent, is that pthreads cannot be used in any environment other than the command line interface. It was never meant to be used in a web server environment (i.e. in an FCGI process) due to safety and scaling concerns, so the advice from pthreads v2 has now been enforced.
There have also been some changes to workers. Previously, there was a need to keep track of the work objects given to workers, otherwise if they were destroyed before having been executed by the worker thread, a segmentation fault would occur. This was well-known behavior and was demonstrated succinctly in the Multi-Threading in PHP with pthreads gist with the following snippet:
class W extends Worker {
public function run(){}
}
class S extends Stackable {
public function run(){}
}
/* 1 */
$w = new W();
/* 2 */
$j = array(
new S(), new S(), new S()
);
/* 3 */
foreach ($j as $job)
$w->stack($job);
/* 4 */
$j = array();
$w->start();
$w->shutdown();
This is no longer an issue because the workers themselves now track the stacked work objects.
Furthermore, there have been some changes around the meaning of method modifiers in pthreads v3. In pthreads v2, method modifiers had a special meaning in the context of Threaded objects. Specifically, protected methods had implicit synchronized access (enabling for them to be safely executed by multiple contexts), and private methods could only be executed by the context they were tied to. These differing semantics have now been removed due to reliability concerns.
For example, take the following snippet:
class ExampleThread extends Thread {
public $value = 0;
public function run()
{
$this->exclusive();
}
protected function exclusive()
{
for ($i = 0; $i < 10000; ++$i) {
++$this->value;
}
}
}
class Test extends ExampleThread {
public function callExclusive()
{
$this->exclusive();
}
};
$thread = new Test();
$thread->start();
$thread->callExclusive();
$thread->join();
var_dump($thread->value);
In pthreads v2, calling the ExampleThread::exclusive
method from both the main context and the new thread context was safe. The value output at the end of the script would always be int(20000)
. But in pthreads v3, this value can be anything from 1 to 20000 due to race conditions between the two unsynchronized for
loops.
In order to achieve the exact same behavior in pthreads v3, we must explicitly synchronize access using the built-in Threaded::synchronized
method. This need only be applied to the body of the ExampleThread::exclusive
method:
protected function exclusive()
{
$this->synchronized(function () {
for ($i = 0; $i < 10000; ++$i) {
++$this->value;
}
});
}
With respect to removing the private method modifier semantics, this has only lifted a previous restriction. Thus, code that utilized that behavior should not need any changing.
Removed Classes
The Mutex
and Cond
classes have been removed. This is because their functionality was not needed due to the synchronization features already provided by the Threaded
class. Using mutual exclusion locks and conditions in PHP code was never particularly safe either, since deadlocks could easily occur from erroneous code.
The Collectable
class that extended Threaded
has also been removed. Now, we have a Collectable
interface instead which is implemented by Threaded
. The interface only enforces an isGarbage
method. The setGarbage
method is no longer needed because pthreads automatically handles when a task should be considered garbage (when the task has finished executing). The Threaded
class implements a default Threaded::isGarbage
method that should be used in the vast majority of cases. The default implementation will alway returns true
, since any task in the task queue is garbage (the task cannot be collected before being executed). Only in rare cases should a custom implementation be needed, and so overriding the Threaded::isGarbage
method should be a rarity.
The following is a brief example of utilizing the built-in garbage collector in pthreads:
$worker = new Worker();
for ($i = 0; $i < 10; ++$i) {
$worker->stack(new class extends Threaded {});
}
$worker->start();
while ($worker->collect()); // blocks until all tasks have finished executing and have been collected
$worker->shutdown();
Finally, the Stackable
class that was previously aliased to the Threaded
class has been removed. Any classes that extended Stackable
should now be changed to extend Threaded
.
Removed Methods
The following methods have been removed:
-
Threaded::getTerminatedInfo
– due to it being unsafe to serialize exceptions. There are no built-in alternatives, but since PHP 7 has converted the vast majority of fatal errors to exceptions, catch-all exceptions handlers can be used instead:$task = new class extends Thread { public function run() { try { $this->task(); } catch (Throwable $ex) { // handle error here var_dump($ex->getMessage()); } } private function task() { $this->data = new Threaded(); $this->data = new StdClass(); // RuntimeException thrown var_dump(1); // never reached } }; $task->start() && $task->join();
(See below for the new
Volatile
class addition and subsequently why the above code is erroneous.) -
Threaded::from
– since PHP 7 has anonymous classes, which are far more preferable to use. -
Threaded::isWaiting
– due to it simply not being needed when synchronizing. A thread should not have to question whether it is waiting for something, and as such, there are no alternatives to this method. -
Threaded::lock
and its counterpartThreaded::unlock
– for the same reasons theMutex
andCond
classes were removed. Given that synchronization now syncs the properties table ofThreaded
objects, that should be used instead. -
Thread::kill
– due to it not being safe to perform. There are no alternatives – code should simply not need to kill a thread in such a high-level environment. -
Thread::detach
– due to it not being safe. There are no alternatives – any code relying on this will need to be rewritten. -
Worker::isWorking
– due to it not being necessary. In order to see if a worker has any tasks left, theWorker::getStacked
method should be used, which will return the size of the remaining stack.
Changed Methods
The following methods have been changed:
-
Worker::unstack
– it no longer accepts a parameter (which previously removed the passed task from the stack). This means that the default now simply removes just the first task (the oldest one) from stack, rather than removing all tasks from the stack. -
Pool::collect
– it now returns the number of tasks to be collected, and the collector callback is now optional. If a collector callback is not used, the defaultWorker::collector
method is used.
New Classes
The Volatile
class has been added due to the new immutability semantics of Threaded
classes, where if they have properties that are Threaded
objects, then they are immutable. The Volatile
class enables for code that previously depended on the mutability of such members to be mutable once again.
For example, the following code snippet would have worked on pthreads v2:
class Task extends Threaded
{
public function __construct()
{
$this->data = new Threaded();
$this->data = new StdClass(); // previously ok, but not in pthreads v3
}
}
new Task();
But now in pthreads v3, the reassignment of $this->data
will throw a RuntimeException
due to it being a Threaded
property from a Threaded
class. In order to validly reassign the property, the Task
class should extend Volatile
instead:
class Task extends Volatile
{
public function __construct()
{
$this->data = new Threaded();
$this->data = new StdClass();
}
}
new Task();
Arrays being assigned to properties of Threaded
objects are now automatically coerced to Volatile
objects instead of Threaded
objects so that their behavior remains largely unchanged.
Whilst this new immutability constraint increases complexity a little, it was introduced for the significant performance gains it gives to accessing Threaded
properties of Threaded
objects.
New Methods
The following methods have been added:
-
Worker::collect
– this was introduced to enable for tasks that have finished executing on a worker’s stack to be freed. An optional collector function may be passed in, however the default collector (fromWorker::collector
) should be sufficient in the vast majority of cases.For example, the following:
$worker = new Worker(); var_dump(memory_get_usage()); // original memory usage for ($i = 0; $i < 500; ++$i) { $worker->stack(new class extends Threaded {}); } var_dump(memory_get_usage()); // memory usage after stacking 500 tasks $worker->start(); while ($worker->collect()); $worker->shutdown(); var_dump(memory_get_usage()); // memory usage after worker shutdown
Outputs:
int(372912) int(486304) int(374528)
With the line that invokes
Worker::collect
, the memory usage nearly returns back to normal. Without it, the memory usage would not have changed between the stacked 500 tasks and the shutting down of the worker. While the memory would have eventually been freed upon destroying the object, it is better to explicitly free this memory (particularly for long running processes that may need to execute many tasks). So always collect the garbage left by workers (as well as pools). -
Worker::collector
– this was introduced as the default implementation used by theWorker::collect
method. We can override this method for when we would like to delay the collecting of spent objects. As mentioned above, the default collector will be sufficient in the vast majority of cases, so only override this method if you know what you’re doing! -
Threaded::notifyOne
– this complimentsThreaded::notify
by enabling for a signal to be sent only to one of the waiting synchronized contexts.
Conclusion
There have been a number of changes to pthreads v3, making the extension both more performant and more robust. Some things have become simpler, particularly around shared resources that can only be handled via the synchronization mechanisms (Threaded::wait
and Threaded::notify
). Other things have increased a little in complexity, particularly with respect to the new immutability restrictions (in exchange for much better performance). But overall, pthreads v3 has received a nice cleanup and is looking ever better.
Are you using it? Did you have to update from v2 to v3? Tell us about it – we’d love to write about a hands-on upgrading example.