Phunkie is a library with functional structures for PHP. In this two-part tutorial, Phunkie’s creator Marcello Duarte, head of training at Inviqa, explains how to create Parser combinators using the functional library. This post first appeared on the Inviqa blog, and was republished here with their permission. For an exploration of PHP’s “non-exotic” features, we have great paid course.
In the first part of this series we explored parsers and combinators to help you start getting values from functional programming with PHP. We covered the basics using examples, and now we’ll move onto more sequencing and other strategies.
Let’s pick up where we left off!
Sequencing Combinators
Ok, let’s now try a more useful parser. A parser that, given a predicate, keeps the character if it satisfies the predicate and fails otherwise. We will call this parser sat
, from satisfies.
describe("sat", function() {
it("parses one character when it satisfies the predicate", function(){
expect(sat("is_numeric")->run("4L"))->toEqual(ImmList(Pair('4', 'L')));
});
it("returns Nil when the character does not satisfy the predicate", function(){
expect(sat("is_numeric")->run("L4"))->toEqual(Nil());
});
});
The implementation, using the primitive parsers item
, result
and zero
, looks like this:
function sat(callable $predicate): Parser
{
return item()->flatMap(function($x) use ($predicate) {
return $predicate($x) ? result($x) : zero();
});
}
You can see how the building blocks are now put to work. We call the item
parser, flatMap
on its result, and apply the predicate. If the predicate succeeds, we return the result
parser, which basically lifts the $x
into the parser context. Otherwise, zero
will just do the same, but with Nil
instead of any result.
We can quickly capitalize on sat
, by creating a few other sequencing combinators.
context("Sequencing combinators", function() {
describe("char", function() {
it("parses a character", function() {
expect(char('h')->run("hello"))->toEqual(ImmList(Pair('h', "ello")));
});
});
describe("digit", function() {
it("parses a digit", function() {
expect(digit()->run("42"))->toEqual(ImmList(Pair('4', '2')));
});
});
describe("lower", function() {
it("parses a lowercase character", function() {
expect(lower()->run("hello"))->toEqual(ImmList(Pair('h', "ello")));
});
});
describe("upper", function() {
it("parses an upper case character", function() {
expect(upper()->run("Hello"))->toEqual(ImmList(Pair('H', "ello")));
});
});
});
You will laugh at how simple the implementation is!
function char($c): Parser
{
return sat(function($input) use ($c) { return $input === $c; });
}
function digit(): Parser
{
return sat('is_numeric');
}
function lower(): Parser
{
return sat(function($c) { return ctype_lower($c); });
}
function upper(): Parser
{
return sat(function($c) { return ctype_upper($c); });
}
Choice Combinators
In the real world of grammars, if we had to return an empty list as soon as the first parser failed, we would not survive. Parsers have to deal with grammatical choice constructs. A very common scenario is either matching one pattern or another pattern. We do that by adding the plus
combinator to our arsenal of parsers.
use function concat;
function plus(Parser $p, Parser $q)): Parser
{
return new Parser(function(string $s) use ($p, $q) {
return concat($p->run($s), $q->run($s));
});
}
I like moving some combinator implementations into the Parser
class itself. For example, the plus
combinator becomes the or
method in the class, creating some syntax sugar for the parsers.
function plus(Parser $p, Parser $q)): Parser
{
return $p->or($q);
}
An example of choice parsers could be one that accepts either lowercase or uppercase characters. We could name that parser letter
. We can also create another parser that accepts digits or letters, which we can name alphanum
.
context("Choice combinators", function() {
describe("letter", function() {
it("can combine parsers to parse letters", function() {
expect(letter()->run("hello"))->toEqual(ImmList(Pair('h', "ello")));
expect(letter()->run("Hello"))->toEqual(ImmList(Pair('H', "ello")));
expect(letter()->run("5ello"))->toEqual(Nil());
});
});
describe("alphanum", function() {
it("can combine parsers to parse alphanum", function() {
expect(alphanum()->run("hello"))->toEqual(ImmList(Pair('h', "ello")));
expect(alphanum()->run("Hello"))->toEqual(ImmList(Pair('H', "ello")));
expect(alphanum()->run("5ello"))->toEqual(ImmList(Pair('5', "ello")));
expect(alphanum()->run("#ello"))->toEqual(Nil());
});
});
And the elegance of the implementation speaks for itself:
function letter(): Parser
{
return plus(lower(), upper());
}
function alphanum(): Parser
{
return plus(letter(), digit());
}
Recursive Combinators
One nice trick we can use is to pass the result
parser to plus
to create non-deterministic parsers. To illustrate this let’s build a word
parser that recognizes entire words from a string. The result may surprise you:
context("Recursive combinators", function() {
describe("word", function() {
it("recognises entire words out of a string", function() {
expect(word()->run("Yes!"))
->toEqual(ImmList(
Pair("Yes", '!'),
Pair("Ye", "s!"),
Pair('Y', "es!"),
Pair("", "Yes!")
));
});
});
//...
});
Before diving into the reason why we have so many pairs of results let’s look at the implementation:
function word(): Parser
{
$nonEmptyWord = letter()->flatMap(function($x) {
return word()->map(function($xs) use ($x) {
return $x . $xs;
});
});
return plus($nonEmptyWord, result(''));
}
As we can see the use of letter
means we are consuming a letter when one is available, however we may consume a letter but not reach the end of the parsing. This kind of choice combinator is non-deterministic. We got some kind of result; we have reached a letter. Now let’s see if there is more. Until we find something that is not a letter and we stop.
Also, note how we use recursion in this implementation to continue parsing, concatenating the results. By the way, that’s the reason I didn’t use the for notation
here. PHP is not a lazy language, thus implementing recursivity with the notation can result in stack overflow.
A very useful parser is one that can recognize an entire string (or token) inside another string. We will call it string
and implement it recursively.
describe("string", function() {
it("parses a string", function() {
expect(string("hello")->run("hello world"))
->toEqual(ImmList(Pair("hello", " world")));
});
expect(string("helicopter")->run("hello world"))
->toEqual(Nil());
});
Here’s the implementation:
function string($s): Parser
{
return strlen($s) ?
for_(
__($c)->_(char($s[0])),
__($cs)->_(string(substr($s, 1)))
)->call(concat, $c, $cs) :
result ('');
}
Simple Repetitions
You can probably imagine that the repetition pattern used in word and string is one that parsers often encounter. We can probably generalize that too. The beauty of creating parsers this way is how they can be combined very easily to create new parsers.
We will now define the simple repetition parser many
. We will make many non-deterministic choices, which means it will never fail, but will return an empty string instead.
context("Simple repetitions", function() {
describe("many", function() {
it("generalises repetition", function() {
expect(many(char('t'))->run("ttthat's all folks"))->toEqual(ImmList(
Pair("ttt", "hat's all folks"),
Pair("tt", "that's all folks"),
Pair('t', "tthat's all folks"),
Pair("", "ttthat's all folks")
));
});
it ("never produces errors", function() {
expect(many(char('x'))->run("ttthat's all folks"))->toEqual(ImmList(
Pair("", "ttthat's all folks")
));
});
});
//...
});
This implementation will look very similar to the word
implementation, only we ask for a parser instead of using letter
, so we can represent repetition of any kind.
function many(Parser $p): Parser
{
return plus($p->flatMap(function($x) use ($p) {
return many($p)->map(function($xs) use ($x) {
return $x . $xs;
});
}), result(''));
}
We can now define word
simply as many(letter())
. Similarly we could try and implement a parser for numbers with many(digit())
, but since many
is non-deterministic, we need a version of many
that matches at least one character. We will call it many1
.
describe("many1", function() {
it("does not have the empty result of many", function() {
expect(many1(char('t'))->run("ttthat's all folks"))->toEqual(ImmList(
Pair("ttt", "hat's all folks"),
Pair("tt", "that's all folks"),
Pair('t', "tthat's all folks")
));
});
it("may produce an error", function() {
expect(many1(char('x'))->run("ttthat's all folks"))->toEqual(Nil());
});
});
Implementation with the for notation
:
function many1(Parser $p): Parser
{
return for_(
__($x)->_($p),
__($xs)->_(many($p))
)->call(concat, $x, $xs);
}
We can then use this to implement our natural numbers parser, nat
. This time we will go a step further and evaluate the result by casting it into an integer. We can get the result by taking the head
of the list, which is a Pair, then use the _1
assessor to get the result. Here is the example:
describe("nat", function() {
it("can be defined with repetition", function() {
expect(nat()->run("34578748fff"))->toEqual(ImmList(
Pair(34578748, "fff"),
Pair(3457874, "8fff"),
Pair(345787, "48fff"),
Pair(34578, "748fff"),
Pair(3457, "8748fff"),
Pair(345, "78748fff"),
Pair(34, "578748fff"),
Pair(3, "4578748fff")
));
expect(nat()->run("34578748fff")->head->_1)->toEqual(34578748);
});
});
And here’s the implementation with the casting:
function nat(): Parser
{
return many1(digit())->map(function($xs) {
return (int) $xs;
});
}
If we want negative numbers as well we can use nat
to build an int
parser.
describe("int", function() {
it("can be defined from char('-') and nat", function() {
expect(int()->run("-251")->head->_1)->toEqual(-251);
expect(int()->run("251")->head->_1)->toEqual(251);
});
});
Implemented simply it looks like:
function int()
{
return plus(for_(
__($_)->_(char('-')),
__($n)->_(nat())
)->call(negate, $n), nat());
}
Whatever falls into the $_
gets ignored but matched. The negate function from the Phunkie library will then convert that number into a negative integer. nat()
makes sure we pick up the positive numbers too.
Repetition with Separators
At this point we can create very interesting parsers already. We could, for example, parse a list of integers written in the PHP array notation: [1,-42,500]
. With what we have already we can write it like this:
for_(
__($open) ->_ (char('[')),
__($n ) ->_ (int() ),
__($ns ) ->_ (many(
for_(
__($comma ) ->_ (char(',')),
__($m ) ->_ (int() )
) -> call (concat, $comma, $m))),
__($close) ->_ (char(']'))
) -> call (concat, $open , $n , $ns , $close);
In fact, we can simplify this by generalizing the usage of the separator — in this case, the character comma (,). Next we will implement sepBy1
, a parser that is applied many times, separated by the application of another parser. Here is an example:
describe("sepby1", function() {
it("applies a parser several times separated by another parser", function() {
expect(sepby1(letter(), digit())->run("a1b2c3d4"))
->toEqual(ImmList(
Pair("abcd", '4'),
Pair("abc", "3d4"),
Pair("ab", "2c3d4"),
Pair('a', "1b2c3d4"))
);
});
});
We can quickly implement it like this:
function sepBy1(Parser $p, Parser $sep)
{
return for_(
__($x)->_($p),
__($xs)->_(many(for_(
__($_)->_($sep),
__($y)->_($p)
)->yields($y)))
)->call(concat, $x, $xs);
}
After moving the implementation of sepBy1
into the Parser
class, we can now re-implement our parser for the list of integers using sepBy1
an infix operator, which makes it more readable.
function ints()
{
return for_( __
($_) ->_ (char('[') ) ,__
($ns ) ->_ (int()->sepBy1(char(','))) ,__
($_) ->_ (char(']'))
) -> yields ($ns);
}
Let’s apply one more generalization here extracting the surrounding pattern.
function surrounded(Parser $open, Parser $p, Parser $close)
{
return for_(
__($_)->_($open),
__($ns)->_($p),
__($_)->_($close)
)->yields($ns);
}
In such a way we can then re-write ints
:
function ints()
{
return surrounded(
char('['),
int()->sepBy1(char(',')),
char(']')
);
}
A JSON Parser
Using what we have now at our disposal, let’s build a JSON parser. I don’t think we’ll need a step-by-step description for this, and I will link the github repo from the post. Here I’ll also talk you through some of the less intuitive aspects.
The parser should basically construct a JsonObject
object based on the specification. This is an almost full implementation. I will leave-out the empty spaces, fancy numbers and string issues for simplicity and brevity.
So here is the example:
context("Json parser", function() {
describe("json_object", function() {
it("parses json objects with no space or fancy digits", function() {
expect((string)json_object()->run(
'{a:1,b:null,c:true,d:{x:"a",b:[1,2,3]}}'
)->head->_1)->toEqual((string)
new JsonObject(
ImmMap([
"a" => new JsonNumber(1),
"b" => new JsonNull(),
"c" => new JsonBoolean(true),
"d" => new JsonObject(
ImmMap([
"x" => new JsonString("a"),
"b" => new JsonArray([
new JsonNumber(1),
new JsonNumber(2),
new JsonNumber(3)
])
])
)
])
));
});
});
});
The json_value
parser is a choice parser that sits on top. Its implementation is very straightforward:
function json_value(): Parser
{
return json_string()
->or(json_boolean())
->or(json_null())
->or(json_number())
->or(json_array())
->or(json_object());
}
An aspect worth mentioning is the conversion from the parsed value into the JSON objects. This is usually achieved with a map
at the end of the parser combinator. See, for example, the json_null
parser:
function json_null(): Parser
{
return string("null")->map(function($_) {
return new JsonNull();
});
}
I create a sepBy1array
combinator that, instead of returning the concatenated values at the end, returns an array of the values parsed. This is very handy for the json_array
parser.
function json_array(): Parser
{
return char('[')->flatMap(function($_) {
return sepBy1array(json_value(), char(','))->flatMap(function($elements) {
return char(']')->map(function($_) use ($elements) {
return new JsonArray(
array_filter($elements, function($a){ return $a != ''; })
);
});
});
});
}
And here’s a similar arrangement using Phunkie’s immutable maps for the objects.
function json_object(): Parser
{
return char('{')->flatMap(function($_) {
return sepBy1Map(word()->flatMap(function($key) {
return char(':')->flatMap(function($colon) use ($key) {
return json_value()->map(function($value) use ($key) {
return ImmMap($key , $value);
});
});
}), char(','))->flatMap(function($pairs) {
return char('}')->map(function() use ($pairs) {
return new JsonObject($pairs);
});
});
});
}
That’s it for now! I hope this has served to show you how powerful functional programming is and how composable parser combinators are. I hope you’ve enjoyed this tutorial and are inspired to try some Phunkie for yourself!
Questions? Comments? Find me on Twitter at @_md or drop a comment below!
Useful Links
[1] – Phunkie repository https://github.com/phunkie/phunkie
[2] – Marcello Duarte’s parsers combinators repository https://github.com/MarcelloDuarte/ParserCombinators
Marcello is head of training at web and software development company Inviqa, and an active contributor to open-source projects. He is the co-creator of PhpSpec, a popular BDD-flavoured testing and design tool for PHP developers, and author of Phunkie. – a functional library containing most common functional structures and patterns found in Haskell (for free), based on the abstract algebra theory of categories.