How PHP Executes - from Source Code to Render

This article was peer reviewed by Younes Rafie. Thanks to all of SitePoint’s peer reviewers for making SitePoint content the best it can be!

Inspired by a recent article on how Ruby code executes, this article covers the execution process for PHP code.

Flowchart vector image

Key Takeaways

PHP code execution involves four stages: Lexing, Parsing, Compilation, and Interpretation. Each stage is crucial in the process of converting PHP source code into machine-readable code.
Lexing, or tokenizing, is the process of turning a string (PHP source code) into a sequence of tokens. Each token is a named identifier for the value it has matched. This stage also stores the lexeme and the line number of the matched token.
The Parsing stage verifies the validity of the token order and generates the abstract syntax tree (AST). The AST is a tree view of the source code used during the Compilation stage.
The Compilation stage emits opcodes by traversing the AST and performs optimizations like resolving function calls with literal arguments and folding constant mathematical expressions. The output of this stage can be inspected using OPcache, VLD, and PHPDBG.
The Interpretation stage is the final stage where the opcodes are run on the Zend Engine (ZE) VM. The output of this stage is what your PHP script outputs via commands such as echo, print, var_dump, etc.

Introduction

There’s a lot going on under the hood when we execute a piece of PHP code. Broadly speaking, the PHP interpreter goes through four stages when executing code:

Lexing
Parsing
Compilation
Interpretation

This article will skim through these stages and show how we can view the output from each stage to really see what is going on. Note that while some of the extensions used should already be a part of your PHP installation (such as tokenizer and OPcache), others will need to be manually installed and enabled (such as php-ast and VLD).

Stage 1 – Lexing

Lexing (or tokenizing) is the process of turning a string (PHP source code, in this case) into a sequence of tokens. A token is simply a named identifier for the value it has matched. PHP uses re2c to generate its lexer from the zend_language_scanner.l definition file.

We can see the output of the lexing stage via the tokenizer extension:

$code = <<<'code'
<?php
$a = 1;
code;

$tokens = token_get_all($code);

foreach ($tokens as $token) {
    if (is_array($token)) {
        echo "Line {$token[2]}: ", token_name($token[0]), " ('{$token[1]}')", PHP_EOL;
    } else {
        var_dump($token);
    }
}

Outputs:

Line 1: T_OPEN_TAG ('<?php
')
Line 2: T_VARIABLE ('$a')
Line 2: T_WHITESPACE (' ')
string(1) "="
Line 2: T_WHITESPACE (' ')
Line 2: T_LNUMBER ('1')
string(1) ";"

There’s a couple of noteworthy points from the above output. The first point is that not all pieces of the source code are named tokens. Instead, some symbols are considered tokens in and of themselves (such as =, ;, :, ?, etc). The second point is that the lexer actually does a little more than simply output a stream of tokens. It also, in most cases, stores the lexeme (the value matched by the token) and the line number of the matched token (which is used for things like stack traces).

Stage 2 – Parsing

The parser is also generated, this time with Bison via a BNF grammar file. PHP uses a LALR(1) (look ahead, left-to-right) context-free grammar. The look ahead part simply means that the parser is able to look n tokens ahead (1, in this case) to resolve ambiguities it may encounter whilst parsing. The left-to-right part means that it parses the token stream from left-to-right.

The generated parser stage takes the token stream from the lexer as input and has two jobs. It firstly verifies the validity of the token order by attempting to match them against any one of the grammar rules defined in its BNF grammar file. This ensures that valid language constructs are being formed by the tokens in the token stream. The second job of the parser is to generate the abstract syntax tree (AST) – a tree view of the source code that will be used during the next stage (compilation).

We can view a form of the AST produced by the parser using the php-ast extension. The internal AST is not directly exposed because it is not particularly “clean” to work with (in terms of consistency and general usability), and so the php-ast extension performs a few transformations upon it to make it nicer to work with.

Let’s have a look at the AST for a rudimentary piece of code:

$code = <<<'code'
<?php
$a = 1;
code;

print_r(ast\parse_code($code, 30));

Output:

ast\Node Object (
    [kind] => 132
    [flags] => 0
    [lineno] => 1
    [children] => Array (
        [0] => ast\Node Object (
            [kind] => 517
            [flags] => 0
            [lineno] => 2
            [children] => Array (
                [var] => ast\Node Object (
                    [kind] => 256
                    [flags] => 0
                    [lineno] => 2
                    [children] => Array (
                        [name] => a
                    )
                )
                [expr] => 1
            )
        )
    )
)

The tree nodes (which are typically of type ast\Node) have several properties:

kind – An integer value to depict the node type; each has a corresponding constant (e.g. AST_STMT_LIST => 132, AST_ASSIGN => 517, AST_VAR => 256)
flags – An integer that specifies overloaded behaviour (e.g. an ast\AST_BINARY_OP node will have flags to differentiate which binary operation is occurring)
lineno – The line number, as seen from the token information earlier
children – sub nodes, typically parts of the node broken down further (e.g. a function node will have the children: parameters, return type, body, etc)

The AST output of this stage is handy to work off of for tools such as static code analysers (e.g. Phan).

Stage 3 – Compilation

The compilation stage consumes the AST, where it emits opcodes by recursively traversing the tree. This stage also performs a few optimizations. These include resolving some function calls with literal arguments (such as strlen("abc") to int(3)) and folding constant mathematical expressions (such as 60 * 60 * 24 to int(86400)).

We can inspect the opcode output at this stage in a number of ways, including with OPcache, VLD, and PHPDBG. I’m going to use VLD for this, since I feel the output is more friendly to look at.

Let’s see what the output is for the following file.php script:

if (PHP_VERSION === '7.1.0-dev') {
    echo 'Yay', PHP_EOL;
}

Executing the following command:

php -dopcache.enable_cli=1 -dopcache.optimization_level=0 -dvld.active=1 -dvld.execute=0 file.php

Our output is:

line     #* E I O op                           fetch          ext  return  operands
-------------------------------------------------------------------------------------
   3     0  E > > JMPZ                                                     <true>, ->3
   4     1    >   ECHO                                                     'Yay'
         2        ECHO                                                     '%0A'
   7     3    > > RETURN                                                   1

The opcodes sort of resemble the original source code, enough to follow along with the basic operations. (I’m not going to delve into the details of opcodes in this article, since that would take several entire articles in itself.) No optimizations were applied at the opcode level in the above script – but as we can see, the compilation phase has made some by resolving the constant condition (PHP_VERSION === '7.1.0-dev') to true.

OPcache does more than simply caching opcodes (thus bypassing the lexing, parsing, and compilation stages). It also packs with it many different levels of optimizations. Let’s turn up the optimization level to four passes to see what comes out:

Command:

php -dopcache.enable_cli=1 -dopcache.optimization_level=1111 -dvld.active=-1 -dvld.execute=0 file.php

Output:

line     #* E I O op                           fetch          ext  return  operands
-------------------------------------------------------------------------------------
   4     0  E >   ECHO                                                     'Yay%0A'
   7     1      > RETURN                                                   1

We can see that the constant condition has been removed, and the two ECHO instructions have been compacted into a single instruction. These are just a taste of the many optimizations OPcache applies when performing passes over the opcodes of a script. I won’t go through the various optimization levels in this article though, since that would also be an article in itself.

Stage 4 – Interpretation

The final stage is the interpretation of the opcodes. This is where the opcodes are run on the Zend Engine (ZE) VM. There’s actually very little to say about this stage (from a high-level perspective, at least). The output is pretty much whatever your PHP script outputs via commands such as echo, print, var_dump, and so on.

So instead of digging into anything complex at this stage, here’s a fun fact: PHP requires itself as a dependency when generating its own VM. This is because the VM is generated by a PHP script, due to it being simpler to write and easier to maintain.

Conclusion

We’ve taken a brief look through the four stages that the PHP interpreter goes through when running PHP code. This has involved using various extensions (including tokenizer, php-ast, OPcache, and VLD) to manipulate and view the output of each stage.

I hope this article has helped to provide you with a better holistic understanding of PHP’s interpreter, as well as shown the importance of the OPcache extension (for both its caching and optimization abilities).

Frequently Asked Questions (FAQs) about PHP Execution Process

What is the role of the PHP interpreter in the execution process?

The PHP interpreter plays a crucial role in the PHP execution process. It is responsible for converting the PHP source code into machine-readable code. The interpreter reads the PHP script line by line, interprets each line, and performs the necessary operations. It is also responsible for handling errors and exceptions during the execution process. The PHP interpreter is a key component of the PHP runtime environment, which also includes the web server and the PHP extensions.

How does the PHP engine work?

The PHP engine is the core of the PHP execution process. It is responsible for parsing the PHP script, compiling it into bytecode, and then executing the bytecode. The PHP engine uses a two-step process to execute PHP scripts. First, it parses the PHP script and converts it into an abstract syntax tree (AST). Then, it compiles the AST into bytecode and executes it. The PHP engine also includes a memory manager and a garbage collector to manage memory usage during the execution process.

What is the difference between PHP’s command-line interface and web server interface?

PHP’s command-line interface (CLI) and web server interface are two different ways to run PHP scripts. The CLI is used for running PHP scripts from the command line, while the web server interface is used for running PHP scripts in response to web requests. The main difference between the two interfaces is the way they handle input and output. In the CLI, input is read from the command line and output is written to the console. In the web server interface, input is read from the HTTP request and output is written to the HTTP response.

How does PHP handle errors during the execution process?

PHP has a robust error handling mechanism that allows it to handle errors during the execution process. When an error occurs, PHP generates an error message and sends it to the error handler. The error handler can either display the error message, log it, or ignore it, depending on the error reporting settings. PHP also supports exception handling, which allows it to handle errors in a more structured and manageable way.

What is the role of PHP extensions in the execution process?

PHP extensions are modules that add new features and functionality to the PHP language. They are loaded into the PHP runtime environment during the execution process and can be used to perform a wide range of tasks, from database access to image processing. PHP extensions are written in C and are compiled into machine code, which makes them very fast and efficient. They are a key component of the PHP ecosystem and contribute to its flexibility and power.

How does PHP optimize the execution process?

PHP uses several techniques to optimize the execution process. One of these techniques is opcode caching, which involves storing the bytecode generated by the PHP engine in memory so that it can be reused in subsequent executions. This eliminates the need to parse and compile the PHP script every time it is executed, resulting in significant performance improvements. PHP also uses just-in-time (JIT) compilation, which involves compiling bytecode into machine code at runtime to further improve performance.

How does PHP handle memory management during the execution process?

PHP has a built-in memory manager that handles memory allocation and deallocation during the execution process. The memory manager allocates memory for variables and data structures as needed, and deallocates memory when it is no longer needed. PHP also has a garbage collector that automatically frees up memory that is no longer in use. This helps to prevent memory leaks and keep memory usage under control.

What is the role of the web server in the PHP execution process?

The web server plays a key role in the PHP execution process. It is responsible for handling HTTP requests, running PHP scripts in response to these requests, and sending HTTP responses back to the client. The web server works closely with the PHP interpreter and the PHP engine to execute PHP scripts and generate dynamic web pages. The most commonly used web servers for PHP are Apache and Nginx.

How does PHP handle database interactions during the execution process?

PHP has built-in support for a wide range of databases, including MySQL, PostgreSQL, and SQLite. It uses database-specific extensions to interact with these databases during the execution process. These extensions provide a set of functions that can be used to connect to the database, execute SQL queries, fetch results, and handle errors. PHP also supports the PDO (PHP Data Objects) extension, which provides a database-agnostic interface for database interactions.

How does PHP handle session management during the execution process?

PHP has built-in support for session management, which allows it to maintain state between different HTTP requests. When a session is started, PHP creates a unique session ID and stores it in a cookie on the client’s browser. This session ID is then sent back to the server with each subsequent request, allowing PHP to identify the client and retrieve the corresponding session data. PHP’s session management features make it easy to implement user authentication, shopping carts, and other stateful features in web applications.