I used to use an extension called XHP. It enables HTML-in-PHP syntax for generating front-end markup. I reached for it recently, and was surprised to find that it was no longer officially supported for modern PHP versions.
So, I decided to implement a user-land version of it, using a basic state-machine compiler. It seemed like it would be a fun project to do with you!
The code for this tutorial can be found on Github.
Creating Compilers
Many developers avoid writing their own compilers or interpreters, thinking that the topic is too complex or difficult to explore properly. I used to feel like that too. Compilers can be difficult to make well, and the topic can be incredibly complex and difficult. But, that doesn’t mean you can’t make a compiler.
Making a compiler is like making a sandwich. Anyone can get the ingredients and put it together. You can make a sandwich. You can also go to chef school and learn how to make the best damn sandwich the world has ever seen. You can study the art of sandwich making for years, and people can talk about your sandwiches in other lands. You’re not going to let the breadth and complexity of sandwich-making prevent you from making your first sandwich, are you?
Compilers (and interpreters) begin with humble string manipulation and temporary variables. When they’re sufficiently popular (or sufficiently slow) then the experts can step in; to replace the string manipulation and temporary variables with unicorn tears and cynicism.
At a fundamental level, compilers take a string of code and run it through a couple of steps:
The code is split into tokens – meaningful characters and sub-strings – which the compiler will use to derive meaning. The statement
if (isEmergency) alert("there is an emergency")
could be considered to contain tokens likeif
,isEmergency
,alert
, and"there is an emergency"
; and these all mean something to the compiler.The first step is to split the entire source code up into these meaningful bits, so that the compiler can start to organize them in a logical hierarchy, so it knows what to do with the code.
The tokens are arranged into the logical hierarchy (sometimes called an Abstract Syntax Tree) which represents what needs to be done in the program. The previous statement could be understood as “Work out if the condition (
isEmergency
) evaluates to true. If it does, run the function (alert
) with the parameter ("there is an emergency"
)”.
Using this hierarchy, the code can be immediately executed (in the case of an interpreter or virtual machine) or translated into other languages (in the case of languages like CoffeeScript and TypeScript, which are both compile-to-Javascript languages).
In our case, we want to maintain most of the PHP syntax, but we also want to add our own little bit of syntax on top. We could create a whole new interpreter…or we could preprocess the new syntax, compiling it to syntactically valid PHP code.
I’ve written about preprocessing PHP before, and it’s my favorite approach to adding new syntax. In this case, we need to write a more complex script; so we’re going to deviate from how we’ve previously added new syntax.
Generating Tokens
Let’s create a function to split code into tokens. It begins like this:
function tokens($code) {
$tokens = [];
$length = strlen($code);
$cursor = 0;
while ($cursor < $length) {
if ($code[$cursor] === "{") {
print "ATTRIBUTE STARTED ({$cursor})" . PHP_EOL;
}
if ($code[$cursor] === "}") {
print "ATTRIBUTE ENDED ({$cursor})" . PHP_EOL;
}
if ($code[$cursor] === "<") {
print "ELEMENT STARTED ({$cursor})" . PHP_EOL;
}
if ($code[$cursor] === ">") {
print "ELEMENT ENDED ({$cursor})" . PHP_EOL;
}
$cursor++;
}
}
$code = '
<?php
$classNames = "foo bar";
$message = "hello world";
$thing = (
<div
className={() => { return "outer-div"; }}
nested={<span className={"nested-span"}>with text</span>}
>
a bit of text before
<span>
{$message} with a bit of extra text
</span>
a bit of text after
</div>
);
';
tokens($code);
// ELEMENT STARTED (5)
// ELEMENT STARTED (95)
// ATTRIBUTE STARTED (122)
// ELEMENT ENDED (127)
// ATTRIBUTE STARTED (129)
// ATTRIBUTE ENDED (151)
// ATTRIBUTE ENDED (152)
// ATTRIBUTE STARTED (173)
// ELEMENT STARTED (174)
// ATTRIBUTE STARTED (190)
// ATTRIBUTE ENDED (204)
// ELEMENT ENDED (205)
// ELEMENT STARTED (215)
// ELEMENT ENDED (221)
// ATTRIBUTE ENDED (222)
// ELEMENT ENDED (232)
// ELEMENT STARTED (279)
// ELEMENT ENDED (284)
// ATTRIBUTE STARTED (302)
// ATTRIBUTE ENDED (311)
// ELEMENT STARTED (350)
// ELEMENT ENDED (356)
// ELEMENT STARTED (398)
// ELEMENT ENDED (403)
This is from
tokens-1.php
We’re off to a good start. By stepping through the code, we can check to see what each character is (and identify the ones that matter to us). We’re seeing, for instance, that the first element opens when we encounter a <
character, at index 5. The first element closes at index 210.
Unfortunately, that first opening is being incorrectly matched to <?php
. That’s not an element in our new syntax, so we have to stop the code from picking it out:
preg_match("#^</?[a-zA-Z]#", substr($code, $cursor, 3), $matchesStart);
if (count($matchesStart)) {
print "ELEMENT STARTED ({$cursor})" . PHP_EOL;
}
// ...
// ELEMENT STARTED (95)
// ATTRIBUTE STARTED (122)
// ELEMENT ENDED (127)
// ATTRIBUTE STARTED (129)
// ATTRIBUTE ENDED (151)
// ATTRIBUTE ENDED (152)
// ATTRIBUTE STARTED (173)
// ELEMENT STARTED (174)
// ...
This is from
tokens-2.php
Instead of checking only the current character, our new code checks three characters: if they match the pattern <div
or </div
, but not <?php
or $num1 < $num2
.
There’s another problem: our example uses arrow function syntax, so =>
is being matched as an element closing sequence. Let’s refine how we match element closing sequences:
preg_match("#^=>#", substr($code, $cursor - 1, 2), $matchesEqualBefore);
preg_match("#^>=#", substr($code, $cursor, 2), $matchesEqualAfter);
if ($code[$cursor] === ">" && !$matchesEqualBefore && !$matchesEqualAfter) {
print "ELEMENT ENDED ({$cursor})" . PHP_EOL;
}
// ...
// ELEMENT STARTED (95)
// ATTRIBUTE STARTED (122)
// ATTRIBUTE STARTED (129)
// ATTRIBUTE ENDED (151)
// ATTRIBUTE ENDED (152)
// ATTRIBUTE STARTED (173)
// ELEMENT STARTED (174)
// ...
This is from
tokens-3.php
As with JSX, it would be good for attributes to allow dynamic values (even if those values are nested JSX elements). There are a few ways we could do this, but the one I prefer is to treat all attributes as text, and tokenize them recursively. To do this, we need to have a kind of state machine which tracks how many levels deep we are in an element and attribute. If we’re inside an element tag, we should trap the top level {…}
as a string attribute value, and ignore subsequent braces. Similarly, if we’re inside an attribute, we should ignore nested element opening and closing sequences:
function tokens($code) {
$tokens = [];
$length = strlen($code);
$cursor = 0;
$elementLevel = 0;
$elementStarted = null;
$elementEnded = null;
$attributes = [];
$attributeLevel = 0;
$attributeStarted = null;
$attributeEnded = null;
while ($cursor < $length) {
$extract = trim(substr($code, $cursor, 5)) . "...";
if ($code[$cursor] === "{" && $elementStarted !== null) {
if ($attributeLevel === 0) {
print "ATTRIBUTE STARTED ({$cursor}, {$extract})" . PHP_EOL;
$attributeStarted = $cursor;
}
$attributeLevel++;
}
if ($code[$cursor] === "}" && $elementStarted !== null) {
$attributeLevel--;
if ($attributeLevel === 0) {
print "ATTRIBUTE ENDED ({$cursor})" . PHP_EOL;
$attributeEnded = $cursor;
}
}
preg_match("#^</?[a-zA-Z]#", substr($code, $cursor, 3), $matchesStart);
if (count($matchesStart) && $attributeLevel < 1) {
print "ELEMENT STARTED ({$cursor}, {$extract})" . PHP_EOL;
$elementLevel++;
$elementStarted = $cursor;
}
preg_match("#^=>#", substr($code, $cursor - 1, 2), $matchesEqualBefore);
preg_match("#^>=#", substr($code, $cursor, 2), $matchesEqualAfter);
if (
$code[$cursor] === ">"
&& !$matchesEqualBefore && !$matchesEqualAfter
&& $attributeLevel < 1
) {
print "ELEMENT ENDED ({$cursor})" . PHP_EOL;
$elementLevel--;
$elementEnded = $cursor;
}
if ($elementStarted && $elementEnded) {
// TODO
$elementStarted = null;
$elementEnded = null;
}
$cursor++;
}
}
// ...
// ELEMENT STARTED (95, <div...)
// ATTRIBUTE STARTED (122, {() =...)
// ATTRIBUTE ENDED (152)
// ATTRIBUTE STARTED (173, {<spa...)
// ATTRIBUTE ENDED (222)
// ELEMENT ENDED (232)
// ELEMENT STARTED (279, <span...)
// ELEMENT ENDED (284)
// ELEMENT STARTED (350, </spa...)
// ELEMENT ENDED (356)
// ELEMENT STARTED (398, </div...)
// ELEMENT ENDED (403)
This is from
tokens-4.php
We’ve added new $attributeLevel
, $attributeStarted
, and $attributeEnded
variables; to track how deep we are in the nesting of attributes, and where the top-level starts and ends. Specifically, if we’re at the top level when an attribute’s value starts or ends, we capture the current cursor position. Later, we’ll use this to extract the string attribute value and replace it with a placeholder.
We’re also starting to capture $elementStarted
and $elementEnded
(with $elementLevel
fulfilling a similar role to $attributeLevel
) so that we can capture a full element opening or closing tag. In this case, $elementEnded
doesn’t refer to the closing tag but rather the closing sequence of characters of the opening tag. Closing tags are treated as entirely separate tokens…
After extracting a small substring after the current cursor position, we can see elements and attributes starting and ending exactly where we expect. The nested control structures and elements are captured as strings, leaving only the top-level elements, non-attribute nested elements, and attribute values.
Let’s package these tokens up, associating attributes with the tags in which they are defined:
function tokens($code) {
$tokens = [];
$length = strlen($code);
$cursor = 0;
$elementLevel = 0;
$elementStarted = null;
$elementEnded = null;
$attributes = [];
$attributeLevel = 0;
$attributeStarted = null;
$attributeEnded = null;
$carry = 0;
while ($cursor < $length) {
if ($code[$cursor] === "{" && $elementStarted !== null) {
if ($attributeLevel === 0) {
$attributeStarted = $cursor;
}
$attributeLevel++;
}
if ($code[$cursor] === "}" && $elementStarted !== null) {
$attributeLevel--;
if ($attributeLevel === 0) {
$attributeEnded = $cursor;
}
}
if ($attributeStarted && $attributeEnded) {
$position = (string) count($attributes);
$positionLength = strlen($position);
$attribute = substr(
$code, $attributeStarted + 1, $attributeEnded - $attributeStarted - 1
);
$attributes[$position] = $attribute;
$before = substr($code, 0, $attributeStarted + 1);
$after = substr($code, $attributeEnded);
$code = $before . $position . $after;
$cursor = $attributeStarted + $positionLength + 2 /* curlies */;
$length = strlen($code);
$attributeStarted = null;
$attributeEnded = null;
continue;
}
preg_match("#^</?[a-zA-Z]#", substr($code, $cursor, 3), $matchesStart);
if (count($matchesStart) && $attributeLevel < 1) {
$elementLevel++;
$elementStarted = $cursor;
}
preg_match("#^=>#", substr($code, $cursor - 1, 2), $matchesEqualBefore);
preg_match("#^>=#", substr($code, $cursor, 2), $matchesEqualAfter);
if (
$code[$cursor] === ">"
&& !$matchesEqualBefore && !$matchesEqualAfter
&& $attributeLevel < 1
) {
$elementLevel--;
$elementEnded = $cursor;
}
if ($elementStarted !== null && $elementEnded !== null) {
$distance = $elementEnded - $elementStarted;
$carry += $cursor;
$before = trim(substr($code, 0, $elementStarted));
$tag = trim(substr($code, $elementStarted, $distance + 1));
$after = trim(substr($code, $elementEnded + 1));
$token = ["tag" => $tag, "started" => $carry];
if (count($attributes)) {
$token["attributes"] = $attributes;
}
$tokens[] = $before;
$tokens[] = $token;
$attributes = [];
$code = $after;
$length = strlen($code);
$cursor = 0;
$elementStarted = null;
$elementEnded = null;
continue;
}
$cursor++;
}
return $tokens;
}
$code = '
<?php
$classNames = "foo bar";
$message = "hello world";
$thing = (
<div
className={() => { return "outer-div"; }}
nested={<span className={"nested-span"}>with text</span>}
>
a bit of text before
<span>
{$message} with a bit of extra text
</span>
a bit of text after
</div>
);
';
tokens($code);
// Array
// (
// [0] => <?php
//
// $classNames = "foo bar";
// $message = "hello world";
//
// $thing = (
// [1] => Array
// (
// [tag] => <div className={0} nested={1}>
// [started] => 157
// [attributes] => Array
// (
// [0] => () => { return "outer-div"; }
// [1] => <span className={"nested-span"}>with text</span>
// )
//
// )
//
// [2] => a bit of text before
// [3] => Array
// (
// [tag] => <span>
// [started] => 195
// )
//
// [4] => {$message} with a bit of extra text
// [5] => Array
// (
// [tag] => </span>
// [started] => 249
// )
//
// [6] => a bit of text after
// [7] => Array
// (
// [tag] => </div>
// [started] => 282
// )
//
// )
This is from
tokens-5.php
There’s a lot going on here, but it’s all just a natural progression from the previous version. We use the captured attribute start and end positions to extract the entire attribute value as one big string. We then replace each captured attribute with a numeric placeholder and reset the code string and cursor positions.
As each element closes, we associate all the attributes since the element was opened, and create a separate array token from the tag (with its placeholders), attributes and starting position. The result may be a little harder to read, but it is spot on in terms of capturing the intent of the code.
So, what do we do about those nested element attributes?
function tokens($code) {
// ...
while ($cursor < $length) {
// ...
if ($elementStarted !== null && $elementEnded !== null) {
// ...
foreach ($attributes as $key => $value) {
$attributes[$key] = tokens($value);
}
if (count($attributes)) {
$token["attributes"] = $attributes;
}
// ...
}
$cursor++;
}
$tokens[] = trim($code);
return $tokens;
}
// ...
// Array
// (
// [0] => <?php
//
// $classNames = "foo bar";
// $message = "hello world";
//
// $thing = (
// [1] => Array
// (
// [tag] => <div className={0} nested={1}>
// [started] => 157
// [attributes] => Array
// (
// [0] => Array
// (
// [0] => () => { return "outer-div"; }
// )
//
// [1] => Array
// (
// [1] => Array
// (
// [tag] => <span className={0}>
// [started] => 19
// [attributes] => Array
// (
// [0] => Array
// (
// [0] => "nested-span"
// )
//
// )
//
// )
//
// [2] => with text
// [3] => Array
// (
// [tag] => </span>
// [started] => 34
// )
// )
//
// )
//
// )
//
// ...
This is from
tokens-5.php
(modified)
Before we associate the attributes, we loop through them and tokenize their values with a recursive function call. We also need to append any remaining text (not inside an attribute or element tag) to the tokens array or it will be ignored.
The result is a list of tokens which can have nested lists of tokens. It’s almost an AST already.
Organizing Tokens
Let’s transform this list of tokens into something more like an AST. The first step is to exclude closing tags that match opening tags. We need to identify which tokens are tags:
function nodes($tokens) {
$cursor = 0;
$length = count($tokens);
while ($cursor < $length) {
$token = $tokens[$cursor];
if (is_array($token)) {
print $token["tag"] . PHP_EOL;
}
$cursor++;
}
}
$tokens = [
0 => '<?php
$classNames = "foo bar";
$message = "hello world";
$thing = (',
1 => [
'tag' => '<div className={0} nested={1}>',
'started' => 157,
'attributes' => [
0 => [
0 => '() => { return "outer-div"; }',
],
1 => [
1 => [
'tag' => '<span className={0}>',
'started' => 19,
'attributes' => [
0 => [
0 => '"nested-span"',
],
],
],
2 => 'with text</span>',
],
],
],
2 => 'a bit of text before',
3 => [
'tag' => '<span>',
'started' => 195,
],
4 => '{$message} with a bit of extra text',
5 => [
'tag' => '</span>',
'started' => 249,
],
6 => 'a bit of text after',
7 => [
'tag' => '</div>',
'started' => 282,
],
8 => ');',
];
nodes($tokens);
// <div className={0} nested={1}>
// <span>
// </span>
// </div>
This is from
nodes-1.php
I’ve extracted a list of tokens from the last token script, so that I don’t have to run and debug that function anymore. Inside a loop, similar to the one we used during tokenization, we print just the non-attribute element tags. Let’s figure out if they’re opening or closing tags, and also whether the closing tags match the opening ones:
function nodes($tokens) {
$cursor = 0;
$length = count($tokens);
while ($cursor < $length) {
$token = $tokens[$cursor];
if (is_array($token) && $token["tag"][1] !== "/") {
preg_match("#^<([a-zA-Z]+)#", $token["tag"], $matches);
print "OPENING {$matches[1]}" . PHP_EOL;
}
if (is_array($token) && $token["tag"][1] === "/") {
preg_match("#^</([a-zA-Z]+)#", $token["tag"], $matches);
print "CLOSING {$matches[1]}" . PHP_EOL;
}
$cursor++;
}
return $tokens;
}
// ...
// OPENING div
// OPENING span
// CLOSING span
// CLOSING div
This is from
nodes-1.php
(modified)
Now that we know which tags are opening tags and which ones are closing ones; we can use reference variables to construct a tree:
function nodes($tokens) {
$nodes = [];
$current = null;
$cursor = 0;
$length = count($tokens);
while ($cursor < $length) {
$token =& $tokens[$cursor];
if (is_array($token) && $token["tag"][1] !== "/") {
preg_match("#^<([a-zA-Z]+)#", $token["tag"], $matches);
if ($current !== null) {
$token["parent"] =& $current;
$current["children"][] =& $token;
} else {
$token["parent"] = null;
$nodes[] =& $token;
}
$current =& $token;
$current["name"] = $matches[1];
$current["children"] = [];
if (isset($current["attributes"])) {
foreach ($current["attributes"] as $key => $value) {
$current["attributes"][$key] = nodes($value);
}
$current["attributes"] = array_map(function($item) {
foreach ($item as $value) {
if (isset($value["tag"])) {
return $value;
}
}
foreach ($item as $value) {
if (!empty($value["token"])) {
return $value;
}
}
return null;
}, $current["attributes"]);
}
}
else if (is_array($token) && $token["tag"][1] === "/") {
preg_match("#^</([a-zA-Z]+)#", $token["tag"], $matches);
if ($current === null) {
throw new Exception("no open tag");
}
if ($matches[1] !== $current["name"]) {
throw new Exception("no matching open tag");
}
if ($current !== null) {
$current =& $current["parent"];
}
}
else if ($current !== null) {
array_push($current["children"], [
"parent" => &$current,
"token" => &$token,
]);
}
else {
array_push($nodes, [
"token" => $token,
]);
}
$cursor++;
}
return $nodes;
}
// ...
// Array
// (
// [0] => Array
// (
// [token] => <?php
//
// $classNames = "foo bar";
// $message = "hello world";
//
// $thing = (
// )
//
// [1] => Array
// (
// [tag] => <div className={0} nested={1}>
// [started] => 157
// [attributes] => Array
// (
// [0] => Array
// (
// [token] => () => { return "outer-div"; }
// )
//
// [1] => Array
// (
// [tag] => <span className={0}>
// [started] => 19
// [attributes] => Array
// (
// [0] => Array
// (
// [token] => "nested-span"
// )
//
// )
//
// [parent] =>
// [name] => span
// [children] => Array
// (
// [0] => Array
// (
// [parent] => *RECURSION*
// [token] => with text
// )
//
// )
//
// )
//
// )
//
// [parent] =>
// [name] => div
// [children] => Array
// (
// [0] => Array
// (
// [parent] => *RECURSION*
// [token] => a bit of text before
// )
//
// [1] => Array
// (
// [tag] => <span>
// [started] => 195
// [parent] => *RECURSION*
// [name] => span
// [children] => Array
// (
// [0] => Array
// (
// [parent] => *RECURSION*
// [token] => {$message} with ...
// )
//
// )
//
// )
//
// [2] => Array
// (
// [parent] => *RECURSION*
// [token] => a bit of text after
// )
//
// )
//
// )
//
// [2] => Array
// (
// [token] => );
// )
//
// )
This is from
nodes-2.php
Take some time to study what’s going on here. We create a $nodes
array, in which to store the new, organized node structures. We also have a $current
variable, to which we assign each opening tag node by reference. This way, we can step down into each element (opening tag, closing tag, and the tokens in between); as well as stepping back up when we encounter a closing tag.
The references are the most tricky part about this, but they’re essential to keeping the code relatively simple. I mean, it’s not that simple; but it is much simpler than a non-reference version.
We don’t have the cleanest function in terms of how it works recursively. So, when we pass the attributes through the nodes
function, we sometimes get empty “token” attributes alongside nested tag attributes. Because of this, we need to filter the attributes to first try and return a nested tag before returning a non-empty token attribute value. This could be cleaned up quite a bit…
Rewriting Code
Now that the code is neatly arranged in a hierarchy or AST, we can rewrite it into valid PHP code. Let’s begin by writing just the string tokens (which aren’t nested inside elements), and formatting the resulting code:
function parse($nodes) {
$code = "";
foreach ($nodes as $node) {
if (isset($node["token"])) {
$code .= $node["token"] . PHP_EOL;
}
}
return $code;
}
$nodes = [
0 => [
'token' => '<?php
$classNames = "foo bar";
$message = "hello world";
$thing = (',
],
1 => [
'tag' => '<div className={0} nested={1}>',
'started' => 157,
'attributes' => [
0 => [
'token' => '() => { return "outer-div"; }',
],
1 => [
'tag' => '<span className={0}>',
'started' => 19,
'attributes' => [
0 => [
'token' => '"nested-span"',
],
],
'name' => 'span',
'children' => [
0 => [
'token' => 'with text',
],
],
],
],
'name' => 'div',
'children' => [
0 => [
'token' => 'a bit of text before',
],
1 => [
'tag' => '<span>',
'started' => 195,
'name' => 'span',
'children' => [
0 => [
'token' => '{$message} with a bit of extra text',
],
],
],
2 => [
'token' => 'a bit of text after',
],
],
],
2 => [
'token' => ');',
],
];
parse($nodes);
// <?php
//
// $classNames = "foo bar";
// $message = "hello world";
//
// $thing = (
// );
This is from
parser-1.php
I’ve copied the nodes extracted from the previous script, so we don’t have to debug or reuse that function again. Let’s deal with the elements as well:
require __DIR__ . "/vendor/autoload.php";
function parse($nodes) {
$code = "";
foreach ($nodes as $node) {
if (isset($node["token"])) {
$code .= $node["token"] . PHP_EOL;
}
if (isset($node["tag"])) {
$props = [];
$attributes = [];
$elements = [];
if (isset($node["attributes"])) {
foreach ($node["attributes"] as $key => $value) {
if (isset($value["token"])) {
$attributes["attr_{$key}"] = $value["token"];
}
if (isset($value["tag"])) {
$elements[$key] = true;
$attributes["attr_{$key}"] = parse([$value]);
}
}
}
preg_match_all("#([a-zA-Z]+)={([^}]+)}#", $node["tag"], $dynamic);
preg_match_all("#([a-zA-Z]+)=[']([^']+)[']#", $node["tag"], $static);
if (count($dynamic[0])) {
foreach($dynamic[1] as $key => $value) {
$props["{$value}"] = $attributes["attr_{$key}"];
}
}
if (count($static[1])) {
foreach($static[1] as $key => $value) {
$props["{$value}"] = $static[2][$key];
}
}
$code .= "pre_" . $node["name"] . "([" . PHP_EOL;
foreach ($props as $key => $value) {
$code .= "'{$key}' => {$value}," . PHP_EOL;
}
$code .= "])" . PHP_EOL;
}
}
$code = Pre\Plugin\expand($code);
$code = Pre\Plugin\formatCode($code);
return $code;
}
// ...
// <?php
//
// $classNames = "foo bar";
// $message = "hello world";
//
// $thing = (
// pre_div([
// 'className' => function () {
// return "outer-div";
// },
// 'nested' => pre_span([
// 'className' => "nested-span",
// ]),
// ])
// );
This is from
parser-2.php
When we find a tag node, we loop through the attributes and build a new attributes array that is either just text from token nodes or parsed tags from tag nodes. This bit of recursion deals with the possibility of attributes that are nested elements. Our regular expression only handles attributes quoted with single quotes (for the sake of simplicity). Feel free to make a more comprehensive expression, to handle more complex attribute syntax and values.
I went ahead and installed pre/short-closures
, so that the arrow function would be expanded to a regular function:
composer require pre/short-closures
There’s also a handle PSR-2 formatting function in there, so our code is formatted according to the standard.
Finally, we need to deal with children:
require __DIR__ . "/vendor/autoload.php";
function parse($nodes) {
$code = "";
foreach ($nodes as $node) {
if (isset($node["token"])) {
$code .= $node["token"] . PHP_EOL;
}
if (isset($node["tag"])) {
// ...
$children = [];
foreach ($node["children"] as $child) {
if (isset($child["tag"])) {
$children[] = parse([$child]);
}
else {
$children[] = "\"" . addslashes($child["token"]) . "\"";
}
}
$props["children"] = $children;
$code .= "pre_" . $node["name"] . "([" . PHP_EOL;
foreach ($props as $key => $value) {
if ($key === "children") {
$code .= "\"children\" => [" . PHP_EOL;
foreach ($children as $child) {
$code .= "{$child}," . PHP_EOL;
}
$code .= "]," . PHP_EOL;
}
else {
$code .= "\"{$key}\" => {$value}," . PHP_EOL;
}
}
$code .= "])" . PHP_EOL;
}
}
$code = Pre\Plugin\expand($code);
$code = Pre\Plugin\formatCode($code);
return $code;
}
// ...
// <?php
//
// $classNames = "foo bar";
// $message = "hello world";
//
// $thing = (
// pre_div([
// "className" => function () {
// return "outer-div";
// },
// "nested" => pre_span([
// "className" => "nested-span",
// "children" => [
// "with text",
// ],
// ]),
// "children" => [
// "a bit of text before",
// pre_span([
// "children" => [
// "{$message} with a bit of extra text",
// ],
// ]),
// "a bit of text after",
// ],
// ])
// );
This is from
parser-3.php
We parse each tag child, and directly quote each token child (adding slashes to account for nested quotes). Then, when we’re building the parameter array; we loop over the children and add each to the string of code our parse
function ultimately returns.
Each tag is converted to an equivalent pre_div
or pre_span
function. This is a placeholder mechanism for a larger, underlying primitive element system. We can demonstrate this by stubbing those functions:
require __DIR__ . "/vendor/autoload.php";
function pre_div($props) {
$code = "<div";
if (isset($props["className"])) {
if (is_callable($props["className"])) {
$class = $props["className"]();
}
else {
$class = $props["className"];
}
$code .= " class='{$class}'";
}
$code .= ">";
foreach ($props["children"] as $child) {
$code .= $child;
}
$code .= "</div>";
return trim($code);
}
function pre_span($props) {
$code = pre_div($props);
$code = preg_replace("#^<div#", "<span", $code);
$code = preg_replace("#div>$#", "span>", $code);
return $code;
}
function parse($nodes) {
// ...
}
$nodes = [
0 => [
'token' => '<?php
$classNames = "foo bar";
$message = "hello world";
$thing = (',
],
1 => [
'tag' => '<div className={0} nested={1}>',
'started' => 157,
'attributes' => [
0 => [
'token' => '() => { return $classNames; }',
],
1 => [
'tag' => '<span className={0}>',
'started' => 19,
'attributes' => [
0 => [
'token' => '"nested-span"',
],
],
'name' => 'span',
'children' => [
0 => [
'token' => 'with text',
],
],
],
],
'name' => 'div',
'children' => [
0 => [
'token' => 'a bit of text before',
],
1 => [
'tag' => '<span>',
'started' => 195,
'name' => 'span',
'children' => [
0 => [
'token' => '{$message} with a bit of extra text',
],
],
],
2 => [
'token' => 'a bit of text after',
],
],
],
2 => [
'token' => ');',
],
3 => [
'token' => 'print $thing;',
],
];
eval(substr(parse($nodes), 5));
// <div class='foo bar'>
// a bit of text before
// <span>
// hello world with a bit of extra text
// </span>
// a bit of text after
// </div>
This is from
parser-4.php
I’ve modified the input nodes, so that $thing
will be printed. If we implement a naive version of pre_div
and pre_span
then this code executes successfully. It’s actually hard to believe, given how little code we’ve actually written…
Integrating with Pre
The question is: what do we with with this?
It’s an interesting experiment, but it’s not very usable. What would be better is to have a way to drop this into an existing project, and experiment with component-based design in the real world. To this end, I extended Pre to allow for custom compilers (along with the custom macro definitions it already allows).
Then, I packaged the tokens
, nodes
, and parse
functions into a re-usable library. It took quite a while to do this and, between the time I first created the functions and built an example application using them, I improved them quite a bit. Some improvements were small (like creating a set of HTML component primitives), and some were big (like refactoring expressions and allowing custom component classes).
I’m not going to go over all these changes, but I’d like to show you what that example application looks like. It begins with a server script:
use Silex\Application;
use Silex\Provider\SessionServiceProvider;
use Symfony\Component\HttpFoundation\Request;
use App\Component\AddTask;
use App\Component\Page;
use App\Component\TaskList;
$app = new Application();
$app->register(new SessionServiceProvider());
$app->get("/", (Request $request) => {
$session = $request->getSession();
$tasks = $session->get("tasks", []);
return (
<Page>
<TaskList>{$tasks}</TaskList>
<AddTask></AddTask>
</Page>
);
});
$app->post("/add", (Request $request) => {
$session = $request->getSession();
$id = $session->get("id", 0);
$tasks = $session->get("tasks", []);
$tasks[] = [
"id" => $id++,
"text" => $request->get("text"),
];
$session->set("id", $id);
$session->set("tasks", $tasks);
return $app->redirect("/");
});
$app->get("/remove/{id}", (Request $request, $id) => {
$session = $request->getSession();
$tasks = $session->get("tasks", []);
$tasks = array_filter($tasks, ($task) => {
return $task["id"] !== (int) $id;
});
$session->set("tasks", $tasks);
return $app->redirect("/");
});
$app->run();
This is from
server.pre
The application is built on top of Silex, which is a neat micro-framework. In order to load this server script, I have an index file:
require __DIR__ . "/../vendor/autoload.php";
Pre\Plugin\process(__DIR__ . "/../server.pre");
This is from
public/index.php
…And I serve this with:
php -S localhost:8080 -t public public/index.php
I haven’t yet tried running this through a web server, like Apache or Nginx. I believe it would run in much the same way.
The server scripts begins with me setting up the Silex server. I define a few routes, the first of which fetches an array of tasks from the current session. If that array hasn’t been defined, I default it to an empty array.
I pass these directly, as children of the TaskList
component. I’ve wrapped this, and the AddTask
component, inside a Page
component. The Page
component looks like this:
namespace App\Component;
use InvalidArgumentException;
class Page
{
public function render($props)
{
assert($this->hasValid($props));
{ $children } = $props;
return (
"<!doctype html>".
<html lang="en">
<body>
{$children}
</body>
</html>
);
}
private function hasValid($props)
{
if (empty($props["children"])) {
throw new InvalidArgumentException("page needs content (children)");
}
return true;
}
}
This is from
app/Component/Page.pre
This component isn’t strictly necessary, but I want to declare the doctype and make space for future header things (like stylesheets and meta tags). I destructure the $props
associative array (using some pre/collections
syntax) and pass this into the <body>
element.
Then there’s the TaskList
component:
namespace App\Component;
class TaskList
{
public function render($props)
{
{ $children } = $props;
return (
<ul className={"task-list"}>
{$this->children($children)}
</ul>
);
}
private function children($children)
{
if (count($children)) {
return {$children}->map(($task) => {
return (
<Task id={$task["id"]}>{$task["text"]}</Task>
);
});
}
return (
<span>No tasks</span>
);
}
}
This is from
app/Component/TaskList.pre
Elements can have dynamic attributes. In fact, this library doesn’t support them having literal (quoted) attribute values. They’re complicated to support, in addition to these dynamic attribute values. I’m defining the className
attribute; which supports a few different formats:
- A literal value expression, like
"task-list"
- An array (or key-less
pre/collection
object), like["first", "second"]
- An associative array (or keyed
pre/collection
object), like["first" => true, "second" => false]
This is similar to the
className
attribute in ReactJS. The keyed or object form uses the truthiness of values to determine whether the keys are appended to the element’sclass
attribute.
All the default elements support non-deprecated and non-experimental attributes defined in the Mozilla Developer Network documentation. All elements support an associative array for their style
attribute, which uses the kebab-case form of CSS style keys.
Finally, all elements support data-
and aria-
attributes, and all attribute values may be functions which return their true values (as a form of lazy loading).
Let’s look at the Task
component:
namespace App\Component;
use InvalidArgumentException;
class Task
{
public function render($props)
{
assert($this->hasValid($props));
{ $children, $id } = $props;
return (
<li className={"task"}>
{$children}
<a href={"/remove/{$id}"}>remove</a>
</li>
);
}
private function hasValid($props)
{
if (!isset($props["id"])) {
throw new InvalidArgumentException("task needs id (attribute)");
}
if (empty($props["children"])) {
throw new InvalidArgumentException("task needs text (children)");
}
return true;
}
}
This is from
app/Component/Task.pre
Each task expects an id
defined for each task (which server.pre
defines), and some children. The children are used for the textual representation of a task, and are defined where the tasks are created, in the TaskList
component.
Finally, let’s look at the AddTask
component:
namespace App\Component;
class AddTask
{
public function render($props)
{
return (
<form method={"post"} action={"/add"} className={"add-task"}>
<input name={"text"} type={"text"} />
<button type={"submit"}>add</button>
</form>
);
}
}
This is from
app/Component/AddTask.pre
This component demonstrates a self-closing input
component, and little else. Of course, the add and remove functionality needs to be defined (in the server script):
$app->post("/add", (Request $request) => {
$session = $request->getSession();
$id = $session->get("id", 1);
$tasks = $session->get("tasks", []);
$tasks[] = [
"id" => $id++,
"text" => $request->get("text"),
];
$session->set("id", $id);
$session->set("tasks", $tasks);
return $app->redirect("/");
});
$app->get("/remove/{id}", (Request $request, $id) => {
$session = $request->getSession();
$tasks = $session->get("tasks", []);
$tasks = array_filter($tasks, ($task) => {
return $task["id"] !== (int) $id;
});
$session->set("tasks", $tasks);
return $app->redirect("/");
});
This is from
server.pre
We’re not storing anything in a database, but we could. These components and scripts are all that there is to the example application. It’s not a huge example, but it does demonstrate various important things, like component nesting and iterative component rendering.
It’s also a good example of how some of the different Pre macros work well together; particularly short closures, collections, and in certain cases async/await.
Here’s a gif of it in action.
Phack
While I was working on this project, I rediscovered a project called Phack, by Sara Golemon. It’s a similar sort of project to Pre, which seeks to transpile a PHP superset language (in this case, Hack) into regular PHP.
The readme lists the Hack features that Phack aims to support, and their status. One of those features is XHP. If you’ve always wanted to write Hack code, but still use standard PHP tools; I recommend checking it out. I’m a huge fan of Sara and her work, so I’ll definitely be keeping an eye on Phack.
Summary
This has been a whirlwind tour of simple compiler creation. We learned how to build a basic state-machine compiler, and how to get it to support HTML-like syntax inside regular PHP syntax. We also looked at how that might work in an example application.
I’d like to encourage you to try this out. Perhaps you’d like to add your own syntax to PHP – which you could do with Pre. Perhaps you’d like to change PHP radically. I hope this tutorial has demonstrated one way to do that, well enough that you feel up to the challenge. Remember: creating compilers doesn’t take a huge amount of knowledge or training. Just simple string manipulation, and some trial and error.
Let us know what you come up with!
Frequently Asked Questions (FAQs) about ReactJS and PHP: Writing Compilers
What are the benefits of using ReactJS with PHP?
ReactJS and PHP are both powerful tools in web development. ReactJS is a JavaScript library that allows developers to build user interfaces, particularly for single-page applications. It provides a more efficient and flexible way of building web applications. On the other hand, PHP is a server-side scripting language designed for web development. It is used to manage dynamic content, databases, session tracking, and even build entire e-commerce sites. When used together, ReactJS can handle the frontend, while PHP takes care of the server-side backend. This combination allows for the creation of robust, full-stack web applications.
How can I integrate ReactJS with PHP?
Integrating ReactJS with PHP involves setting up a ReactJS frontend to interact with a PHP backend. This can be achieved by creating a RESTful API with PHP that will communicate with the ReactJS frontend. The frontend can make HTTP requests to the backend and receive responses. This setup allows for the separation of concerns, where the frontend and backend are developed and maintained separately, leading to cleaner and more manageable code.
Can I write PHP scripts in a ReactJS file?
ReactJS and PHP are fundamentally different and serve different purposes. ReactJS is a client-side JavaScript library, while PHP is a server-side scripting language. Therefore, you cannot directly write PHP scripts in a ReactJS file. However, you can set up a system where ReactJS communicates with a PHP backend through HTTP requests. This way, you can still use PHP logic in response to ReactJS actions.
What is a compiler in the context of PHP and ReactJS?
A compiler is a program that translates code written in one programming language (the source language) into another language (the target language). In the context of PHP and ReactJS, a compiler could be used to translate JSX (a syntax extension for JavaScript used in ReactJS) into regular JavaScript that the browser can understand. This is necessary because browsers cannot directly interpret JSX.
How can I create a JSX-like DSL that compiles into PHP?
Creating a JSX-like Domain Specific Language (DSL) that compiles into PHP would involve writing a compiler that can translate the DSL into PHP. This is a complex task that requires a deep understanding of both the source and target languages, as well as compiler theory. However, tools like Babel can help with this process by providing a platform for building the compiler.
What online compilers are available for PHP?
There are several online compilers available for PHP, including Repl.it, PHP Fiddle, and Paiza.IO. These platforms allow you to write, run, and debug PHP code directly in your web browser, without needing to install any software on your computer.
How can I use ReactJS with the Symfony PHP framework?
Symfony is a PHP framework used for building web applications. You can use ReactJS with Symfony by integrating it into your Symfony project as a frontend library. This can be done by installing ReactJS via npm or yarn, and then including it in your Symfony templates. You can also use the Webpack Encore bundle provided by Symfony to manage your JavaScript and CSS assets, including ReactJS.
What are the challenges of using ReactJS with PHP?
While using ReactJS with PHP can provide many benefits, it also comes with its own set of challenges. These include dealing with the asynchronous nature of JavaScript, managing state between the frontend and backend, and setting up communication between ReactJS and PHP. However, these challenges can be overcome with proper planning and understanding of both technologies.
Can I use PHP to render ReactJS components server-side?
While it is technically possible to render ReactJS components server-side with PHP, it is not recommended. ReactJS was designed to run in the browser, and trying to render it server-side with PHP can lead to complications. Instead, it is better to use Node.js for server-side rendering of ReactJS components, as it is capable of running JavaScript.
How can I learn more about using ReactJS with PHP?
There are many resources available for learning about using ReactJS with PHP. These include online tutorials, documentation, and courses. Websites like SitePoint, Programiz, and W3Schools offer many articles and tutorials on the subject. Additionally, communities like Stack Overflow and the ReactJS and PHP subreddits on Reddit can be great places to ask questions and learn from others’ experiences.
Christopher is a writer and coder, working at Over. He usually works on application architecture, though sometimes you'll find him building compilers or robots.