HHVM revisited

Just over two years have passed since the last post about HHVM by Matt Turland. What changed in that time? Did anything? Let's see just how successful PHP's quest for performance was.

HHVM – what was that again?

Like Matt says in his article, HHVM is

[...] a Just-In-Time (or JIT) compiler. Rather than going through the C++ compilation phase to translate PHP to native code, hhvm translates PHP to an intermediary bytecode language called HipHop Byte Code (HHBC) that can be translated to native code in real time when it’s needed [...]

In other words, your PHP code is executed in near-native (near C++) speed on-the-fly. HipHop started out as a compiler whose binaries, when produced, you would deploy – much like Zend Guard or IonCube. HHVM is different in that it's real-time. There's still no manual compilation process per se, and the development cycle remains swift.

So what is it? A server? How does it actually execute my PHP? Does it work with Apache/Nginx/IIS? Does it replace them? Can it even compete with them? How do you use it? Do you install it as an extension? These types of questions can cause some proper headaches when it all sounds so abstract, I know. Let's try and demystify it a bit further.

How does HHVM actually work?

HHVM, when given a PHP file, compiles said file to HHVM bytecode. It literally rewrites it to a lower language, which is quicker to further rewrite into native code – something a CPU can interpret directly. As you might imagine, doing this over and over again every time a file is accessed would be overkill and would actually make the whole application slower. This is why HHVM uses an SQLite database on disk to remember the compiled files.

If you're familiar with APC, this is more or less the same thing. When HHVM first starts up, the cache is empty, so everything seems slower in the first few requests. In time, however, things warm up and the app speeds up. There's one crucial difference between APC's way of working and HHVM's. APC stores the compiled code in memory, while HHVM does this on disk (the SQLite DB). This means the compiled results survive restarts – when you reboot your server, the cached compiled files are still there, thus not needing the warm-up period. While this is somewhat slower than reading from memory (every disk will always be slower than RAM), it also means you can further improve your app's speed by hosting it on an SSD server.

A further boost in performance is achieved through JIT. This is an option, but it's on by default for server and daemon mode. What JIT does is further compile the HHVM bytecode into native code as it learns about the most frequently used files. Again, this brings along a certain warmup period, but once the ball is rolling there's quite a bit of inertia to compete with.

HHVM compatibility with popular servers

HHVM is installed via a package manager or built from source. It is, in essence, just a program you need to install like any other – currently available on Ubuntu, Debian, CentOS and Fedora. Initially, HHVM replaced the entire PHP+server stack – it had its own server. It still does, in fact. HHVM's server is not unlike something you might see when working with NodeJS – you run it from the command line, tell it to accept requests on a given port, and everything else you usually tell Apache, Nginx, IIS, Tomcat, Lighttpd or any other server. There is a plethora of configuration options for your perusal (https://github.com/facebook/hhvm/wiki/runtime-options) and most of them are rather self explanatory and straightforward. In fact, if you follow HHVM's excellent WordPress tutorial from a while back, you'll see how simple a basic server configuration is.

However, on December 17th, 2013 the HHVM team announced FastCGI support. FastCGI is a protocol for a server's communication with the application server. This allows for a separation of responsibilities – HHVM runs PHP code, which is what it was meant for, and your server handles all the HTTP aspects, forwarding PHP processing to HHVM. While the original HHVM server isn't bad, it's good to know HHVM can now be used in tandem with your usual server of choice. It's important to note it currently only supports communication through TCP – once Unix socket support is added, the network bottleneck that's currently plaguing it will disappear, and HHVM's performance will improve even further.

Performance considerations

When considering speed, the actual performance of HHVM FastCGI vs HHVM alone depends on the number of static resources. While pure HHVM will almost always be faster in execution of PHP scripts, when it comes to the number static resources, a server optimized for such a task just might do a better job – as confirmed by HHVM's tweet:

Naturally, this is void if you use a third party CDN for static resources, so it's up to you. The good news is – both approaches are simple to implement, and doing some a/b testing on performance and serving shouldn't be too hard should you decide to find out which approach is better for your app.

Ways to improve general speed altogether are:
– deploying on SSD for faster cache reads/writes
– pre-analyzing
– authoritative cache
– follow these micro-optimizations

Pre-analyzing

Pre-analyzing is a concept well covered on their official blog, but I'll try and sum it up here. When HHVM is compiled/installed, it has a binary called hhvm-analyze (or hphp if you built it from source). This binary is what's run when doing optimizations in runtime – it's executed before the server spins up to serve the request results. With HHVM, you have the option of running analyze before the app is live – you can pass a list of files to it which you want included in the cache, and pre-cache them. This has the added benefit of avoiding the warmup period and does some more detailed, slower optimizations because no one is waiting for it but you. The assumption is you ran it to pre-optimize, so it takes some extra time to do it super-right.

Authoritative Cache

When you set the cache as authoritative in the web server or daemon configuration like so…

Repo {
  Authoritative = true
}

…HHVM assumes the compiled code cache is the only source of code. When this is false (default), HHVM checks if a file was changed, and if it was, it needs to recompile. This means extra disk work (even if no recompile is needed, it still had to read the file header to see if it was changed at all), which means additional performance hits. With Authoritative cache set to true, HHVM never even looks for the original file. If it changed, you manually need to rebuild your cache. If you updated HHVM to a newer version, the cache is invalidated, and you need to rebuild the cache manually again. Due to these extra steps, the authoritative flag is usually only used in production where there is little to no chance of individual PHP files changing on a regular basis.

Conclusion

HHVM has evolved rapidly and significantly over the past 2 years when we last covered it. The performance improvements are measured in factors, the usability has increased impressively, the installation learning curve has all but disappeared and recent tests show it even supports most of today's popular frameworks and open source PHP apps, from PhpMyAdmin to Symfony2 and beyond.

With this level of maturity, SitePoint will be covering HHVM to a much greater extent. Follow the HHVM tag for further articles and tutorials, including but not limited to demo apps, custom framework installations, live deployments, and more.

If you'd like a special HHVM use case covered, or would like to talk about your own experiences with it, please do get in touch with me via +BrunoSkvorc and we'll talk more! Leave your comments and thoughts below!

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • Tu Hoang

    Have anyone done the benchmark HHVM vs latest PHP with APC/Zend Opcache?

    • http://www.dev-metal.com/ Chris

      I’ve written a short summary of some benchmarks I’ve found: http://www.dev-metal.com/phps-hiphop-outperforms-php-5-5-zend-opcache-nginx-15-20-times/

      • http://www.bitfalls.com/ Bruno Skvorc

        Thank you for sharing this, extremely interesting stuff!

    • Kevin

      And , there are things to consider when doing a benchmark.
      1) db will still be a bottleneck. (hhvm is to optimize php to metal, so use a code which consumes more cpu/stack space(in regular php) and if possible no db operations)
      2) first 11-15 runs will be interpreted afterwards jit kicks in!!

  • http://codingcorner.tk/ Abhinandan

    Is there any benchmark of HHVM vs Python? because python has similar JIT compilation mechanism which its fanboys always show off!

  • Allan MacGregor

    I wrote a small post about getting started with HHVM

    http://coderoncode.com/2013/07/27/first-steps-on-hhvm.html

    • http://www.bitfalls.com/ Bruno Skvorc

      Brilliant, thanks for sharing!

  • http://www.bitfalls.com/ Bruno Skvorc

    I’ll definitely do my best, but I’m far from such an expert on HHVM. Let’s hope the HHVM team puts out some hints we can explore and explain here. If you come across any findings or optimizations, or just some interesting facts and features while working with HHVM, feel free to get in touch via bruno.skvorc@sitepoint.com or +BrunoSkvorc, and maybe we can arrange an article!

  • Kevin

    break code to more functions (all the statement’s in the php file wont be jitted (see: http://stackoverflow.com/questions/20859304/how-to-improve-poor-array-performance-with-hhvm/) )

    there are more optimization tips around the net for hhvm(they are not organized though!!).. “ive bookmarked some urls but they are spread across 10 nodes (pcs,laps,tabs :-P )