PHP’s Quest for Performance: From C to hhvm
While focus of recent core PHP development has been on new language features, a growing concern in recent years has centered around performance. While it’s sufficient for many users, as PHP sees increased use by large sites like Wikipedia and Facebook, the ability to serve more requests on fewer servers becomes increasingly important.
Some efforts have been made in this area in the last few years, both within and outside the PHP internals team. However, understanding exactly what’s going on requires a bit of background both in history and concepts.
Origins of PHP
In 1994, Rasmus Lerdorf wanted a simpler way to display his résumé to Internet users and monitor traffic to it. At the time, web development was done by writing CGI programs in C, which could be tedious and required knowledge of many low-level concepts like manual memory management. And so Rasmus began work on what was to become PHP.
To truly appreciate how novel this idea was for its time requires an understanding of a few concepts. You’ve probably heard before that computers function on ones and zeros, the constituents of the binary number system. Computer processors build on this foundation by establishing sets of numerical instructions collectively referred to as assembly language which can be translated into binary. These instructions are limited to fairly simple operations, like adding two numbers together.
Compilers like the C compiler Rasmus used to build PHP build upon this by accepting code written in a higher-level language and converting it into binary for a specific processor. These tools allow programming languages to be more human-readable which in turn allows more complex tasks to be accomplished with less code than assembly would require, and make it possible for the same code to be compiled for multiple processors with different instruction sets.
PHP itself is somewhat different: it’s an interpreted language rather than a compiled language like C. Instead of your program being compiled, the PHP interpreter is compiled for your server’s particular processor and that interpreter executes your program. This layer of abstraction has the advantage of letting you run your PHP code without compiling it. This same ability was touted as an advantage of Java, which appeared a year later in 1995, with the phrase “write once, run anywhere.”
Interpreted languages don’t come without a cost, though. Compiled programs run faster because they exist as instructions that the processor can understand directly. Interpreters must essentially implement the function of the processor hardware as software. For example, the PHP interpreter converts PHP code into opcodes, its equivalent of assembly, and executes them. Interpreters are often called virtual machines for this reason; like the (system) virtual machines you use to run a guest operating system inside of a host operating system, they’re an abstraction within an abstraction that offers convenience in exchange for performance. In the case of PHP, the Zend Engine (1.0 in PHP 4, 2.0 in PHP 5) is its virtual machine. This is where our story begins.
Will It Compile?
A man by the name of Paul Biggar decided to explore this question with PHP. He joined the phc project in 2006 with the goal of being able to translate PHP to equivalent C code which could then be compiled down to native code, among other things. Its development continues today, though releases aren’t very frequent.
Not long after Paul began his efforts, in early 2007, Roadsend began a project with similar aims under the name pcc, which was later changed to rphp or Roadsend PHP. However, at the time this article was written, the project hasn’t seen a commit in roughly a year. Its support forums still see the occasional post, but most are indicate that working with Roadsend is rather troublesome. The project’s IRC channel also appears inactive.
Other projects exist that port some form of PHP to a similar platform: Phalanger runs it on the .NET CLR and Quercus and Project Zero run it on the Java JVM. These projects remain at least somewhat active. Unlike phc and rphp, though, they compile source code to bytecode that requires a runtime library to be executed, rather than to native code the processor can execute directly.
Enhancing PHP Performance
In early 2008, the original traits RFC paved the way for the RFC standard and process that would later be adopted by the PHP internals team. Two years later, Dmitry Stogov and Stanislav “Stas” Malyshev, two members of the PHP internals team, filed a proposal outlining potential performance improvements of up to 20% in the Zend Engine which was later implemented. That same year, PHP 5.3 was released and included a switch from the flex lexer to re2c which also resulted in a slight performance improvement. It was clear that the performance of PHP had become more than just an academic concern.
Facebook announced their initiative in mid-February 2010 to improve their overall responsiveness twofold. About that same time they announced the release of HipHop for PHP, a project that has goals similar to phc and rphp. HipHop transforms PHP code into C++ code that is then compiled by the g++ compiler. This followed numerous performance-enhancing contributions to the APC extension and the Zend Engine, and HipHop seemed the logical next step for Facebook to continue their use of PHP and maintain their substantial existing PHP codebase.
Implementing HipHop required sacrificing some of PHP’s more dynamic features, like
eval() support, as well as reimplementing PHP’s runtime and rewriting several of its extensions. However, it removed the need for developers to use C or C++ and deal with the tedium that comes with such languages to write compiled code. Indeed, Facebook claims CPU usage decreased by 50% after deploying HipHop to their servers.
The project spurred a number of opinions from members the PHP community upon its initial release including Sebastian Bergmann, lead developer of PHPUnit; David Coallier, PEAR President and CTO of echolibre; and Brandon Savage, a developer at Mozilla. Overall, its reception was positive, and development on the project continued. Facebook worked with several popular projects in the PHP community, including Drupal and WordPress, to help them remove code using dynamic features not supported by HipHop and to fix bugs found while attempting to deploy those projects on HipHop. They’ve also continued to improve performance, even to the extent of implementing more efficient memory allocation.
The New Virtual Machine
Recently, Facebook released the HipHop Virtual Machine, or hhvm, which received coverage from several sources like Ars Technica and ReadWriteWeb. hhvm blurs the lines between compiler and virtual machine like the Zend Engine.
hhvm is what’s called a Just-In-Time (or JIT) compiler. Rather than going through the C++ compilation phase to translate PHP to native code, hhvm translates PHP to an intermediary bytecode language called HipHop Byte Code (HHBC) that can be translated to native code in real time when it’s needed.
hhvm is intended to be a replacement for deploying statically compiled HipHop-based binaries to production. Upon its initial release, hhvm provides an overall relative performance increase of 60%.
It’s users and members of the community who drive the innovative development of PHP and its architecture. Facebook is just one of many companies who have contributed to the ongoing success of PHP. Not only are they contributing changes back to the original projects, they’ve offered their own work on PHP to the community and work with other PHP projects to enable them to take advantage of projects like HipHop.
Indeed, PHP has come a long way from its humble beginnings in 1995. While HipHop may not be suitable for all projects or all servers, it can provide performance enhancements to large-scale projects and help them substantially mitigate their operating hardware requirements. It’s definitely a project to keep on your radar moving forward, and is a testament to the vibrancy of the PHP ecosystem.