PHP
Article
By Thomas Punt

Upgrading from Pthreads v2 to v3: What to Look out For

By Thomas Punt

A fair amount has changed for the pthreads extension with the release of pthreads v3. This article aims to cover the necessary information for those who are looking to upgrade their applications from pthreads v2 to v3.

If you’re unfamiliar with pthreads, check out my introduction to pthreads instead!

A big thank you to Joe Watkins for proofreading and helping to improve my article!

Abstract image of parallel tracks with superimposed numbers 2 and 3, indicating version change

Generic Changes

There have been a few general changes made in pthreads v3. The first, and perhaps most prominent, is that pthreads cannot be used in any environment other than the command line interface. It was never meant to be used in a web server environment (i.e. in an FCGI process) due to safety and scaling concerns, so the advice from pthreads v2 has now been enforced.

There have also been some changes to workers. Previously, there was a need to keep track of the work objects given to workers, otherwise if they were destroyed before having been executed by the worker thread, a segmentation fault would occur. This was well-known behavior and was demonstrated succinctly in the Multi-Threading in PHP with pthreads gist with the following snippet:

class W extends Worker {
    public function run(){}
}
class S extends Stackable {
    public function run(){}
}
/* 1 */
$w = new W();
/* 2 */
$j = array(
    new S(), new S(), new S()
);
/* 3 */
foreach ($j as $job)
    $w->stack($job);
/* 4 */
$j = array();
$w->start();
$w->shutdown();

This is no longer an issue because the workers themselves now track the stacked work objects.

Furthermore, there have been some changes around the meaning of method modifiers in pthreads v3. In pthreads v2, method modifiers had a special meaning in the context of Threaded objects. Specifically, protected methods had implicit synchronized access (enabling for them to be safely executed by multiple contexts), and private methods could only be executed by the context they were tied to. These differing semantics have now been removed due to reliability concerns.

For example, take the following snippet:

class ExampleThread extends Thread {
    public $value = 0;

    public function run()
    {
        $this->exclusive();
    }

    protected function exclusive()
    {
        for ($i = 0; $i < 10000; ++$i) {
            ++$this->value;
        }
    }
}

class Test extends ExampleThread {
    public function callExclusive()
    {
        $this->exclusive();
    }
};

$thread = new Test();
$thread->start();
$thread->callExclusive();
$thread->join();

var_dump($thread->value);

In pthreads v2, calling the ExampleThread::exclusive method from both the main context and the new thread context was safe. The value output at the end of the script would always be int(20000). But in pthreads v3, this value can be anything from 1 to 20000 due to race conditions between the two unsynchronized for loops.

In order to achieve the exact same behavior in pthreads v3, we must explicitly synchronize access using the built-in Threaded::synchronized method. This need only be applied to the body of the ExampleThread::exclusive method:

protected function exclusive()
{
    $this->synchronized(function () {
        for ($i = 0; $i < 10000; ++$i) {
            ++$this->value;
        }
    });
}

With respect to removing the private method modifier semantics, this has only lifted a previous restriction. Thus, code that utilized that behavior should not need any changing.

Removed Classes

The Mutex and Cond classes have been removed. This is because their functionality was not needed due to the synchronization features already provided by the Threaded class. Using mutual exclusion locks and conditions in PHP code was never particularly safe either, since deadlocks could easily occur from erroneous code.

The Collectable class that extended Threaded has also been removed. Now, we have a Collectable interface instead which is implemented by Threaded. The interface only enforces an isGarbage method. The setGarbage method is no longer needed because pthreads automatically handles when a task should be considered garbage (when the task has finished executing). The Threaded class implements a default Threaded::isGarbage method that should be used in the vast majority of cases. The default implementation will alway returns true, since any task in the task queue is garbage (the task cannot be collected before being executed). Only in rare cases should a custom implementation be needed, and so overriding the Threaded::isGarbage method should be a rarity.

The following is a brief example of utilizing the built-in garbage collector in pthreads:

$worker = new Worker();

for ($i = 0; $i < 10; ++$i) {
    $worker->stack(new class extends Threaded {});
}

$worker->start();

while ($worker->collect()); // blocks until all tasks have finished executing and have been collected

$worker->shutdown();

Finally, the Stackable class that was previously aliased to the Threaded class has been removed. Any classes that extended Stackable should now be changed to extend Threaded.

Removed Methods

The following methods have been removed:

  • Threaded::getTerminatedInfo – due to it being unsafe to serialize exceptions. There are no built-in alternatives, but since PHP 7 has converted the vast majority of fatal errors to exceptions, catch-all exceptions handlers can be used instead:

    $task = new class extends Thread {
        public function run()
        {
            try {
                $this->task();
            } catch (Throwable $ex) {
                // handle error here
                var_dump($ex->getMessage());
            }
        }
        private function task()
        {
            $this->data = new Threaded();
            $this->data = new StdClass(); // RuntimeException thrown
            var_dump(1); // never reached
        }
    };
    $task->start() && $task->join();
    

    (See below for the new Volatile class addition and subsequently why the above code is erroneous.)

  • Threaded::from – since PHP 7 has anonymous classes, which are far more preferable to use.

  • Threaded::isWaiting – due to it simply not being needed when synchronizing. A thread should not have to question whether it is waiting for something, and as such, there are no alternatives to this method.

  • Threaded::lock and its counterpart Threaded::unlock – for the same reasons the Mutex and Cond classes were removed. Given that synchronization now syncs the properties table of Threaded objects, that should be used instead.

  • Thread::kill – due to it not being safe to perform. There are no alternatives – code should simply not need to kill a thread in such a high-level environment.

  • Thread::detach – due to it not being safe. There are no alternatives – any code relying on this will need to be rewritten.

  • Worker::isWorking – due to it not being necessary. In order to see if a worker has any tasks left, the Worker::getStacked method should be used, which will return the size of the remaining stack.

--ADVERTISEMENT--

Changed Methods

The following methods have been changed:

  • Worker::unstack – it no longer accepts a parameter (which previously removed the passed task from the stack). This means that the default now simply removes just the first task (the oldest one) from stack, rather than removing all tasks from the stack.

  • Pool::collect – it now returns the number of tasks to be collected, and the collector callback is now optional. If a collector callback is not used, the default Worker::collector method is used.

New Classes

The Volatile class has been added due to the new immutability semantics of Threaded classes, where if they have properties that are Threaded objects, then they are immutable. The Volatile class enables for code that previously depended on the mutability of such members to be mutable once again.

For example, the following code snippet would have worked on pthreads v2:

class Task extends Threaded
{
    public function __construct()
    {
        $this->data = new Threaded();
        $this->data = new StdClass(); // previously ok, but not in pthreads v3
    }
}

new Task();

But now in pthreads v3, the reassignment of $this->data will throw a RuntimeException due to it being a Threaded property from a Threaded class. In order to validly reassign the property, the Task class should extend Volatile instead:

class Task extends Volatile
{
    public function __construct()
    {
        $this->data = new Threaded();
        $this->data = new StdClass();
    }
}

new Task();

Arrays being assigned to properties of Threaded objects are now automatically coerced to Volatile objects instead of Threaded objects so that their behavior remains largely unchanged.

Whilst this new immutability constraint increases complexity a little, it was introduced for the significant performance gains it gives to accessing Threaded properties of Threaded objects.

New Methods

The following methods have been added:

  • Worker::collect – this was introduced to enable for tasks that have finished executing on a worker’s stack to be freed. An optional collector function may be passed in, however the default collector (from Worker::collector) should be sufficient in the vast majority of cases.

    For example, the following:

    $worker = new Worker();
    
    var_dump(memory_get_usage()); // original memory usage
    
    for ($i = 0; $i < 500; ++$i) {
        $worker->stack(new class extends Threaded {});
    }
    
    var_dump(memory_get_usage()); // memory usage after stacking 500 tasks
    
    $worker->start();
    while ($worker->collect());
    $worker->shutdown();
    
    var_dump(memory_get_usage()); // memory usage after worker shutdown
    

    Outputs:

    int(372912)
    int(486304)
    int(374528)
    

    With the line that invokes Worker::collect, the memory usage nearly returns back to normal. Without it, the memory usage would not have changed between the stacked 500 tasks and the shutting down of the worker. While the memory would have eventually been freed upon destroying the object, it is better to explicitly free this memory (particularly for long running processes that may need to execute many tasks). So always collect the garbage left by workers (as well as pools).

  • Worker::collector – this was introduced as the default implementation used by the Worker::collect method. We can override this method for when we would like to delay the collecting of spent objects. As mentioned above, the default collector will be sufficient in the vast majority of cases, so only override this method if you know what you’re doing!

  • Threaded::notifyOne – this compliments Threaded::notify by enabling for a signal to be sent only to one of the waiting synchronized contexts.

Conclusion

There have been a number of changes to pthreads v3, making the extension both more performant and more robust. Some things have become simpler, particularly around shared resources that can only be handled via the synchronization mechanisms (Threaded::wait and Threaded::notify). Other things have increased a little in complexity, particularly with respect to the new immutability restrictions (in exchange for much better performance). But overall, pthreads v3 has received a nice cleanup and is looking ever better.

Are you using it? Did you have to update from v2 to v3? Tell us about it – we’d love to write about a hands-on upgrading example.

Recommended
Sponsors
Get the latest in PHP, once a week, for free.