PHP DOM: Using XPath

In a recent article I discussed PHP’s implementation of the DOM and introduced various functions to pull data from and manipulate an XML structure. I also briefly mentioned XPath, but didn’t have much space to discuss it. In this article, we’ll look closer at XPath, how it functions, and how it is implemented in PHP. You’ll find that XPath can greatly reduce the amount of code you have to write to query and filter XML data, and will often yield better performance as well.

I’ll use the same DTD and XML from the previous article to demonstrate the PHP DOM XPath functionality. To quickly refresh your memory, here’s what the DTD and XML look like:

<!ELEMENT library (book*)> 
<!ELEMENT book (title, author, genre, chapter*)> 
  <!ATTLIST book isbn ID #REQUIRED> 
<!ELEMENT title (#PCDATA)> 
<!ELEMENT author (#PCDATA)> 
<!ELEMENT genre (#PCDATA)> 
<!ELEMENT chapter (chaptitle,text)> 
  <!ATTLIST chapter position NMTOKEN #REQUIRED> 
<!ELEMENT chaptitle (#PCDATA)> 
<!ELEMENT text (#PCDATA)>
<?xml version="1.0" encoding="utf-8"?> 
<!DOCTYPE library SYSTEM "library.dtd"> 
<library> 
  <book isbn="isbn1234"> 
    <title>A Book</title> 
    <author>An Author</author> 
    <genre>Horror</genre> 
    <chapter position="first"> 
      <chaptitle>chapter one</chaptitle> 
      <text><![CDATA[Lorem Ipsum...]]></text> 
    </chapter> 
  </book> 
  <book isbn="isbn1235"> 
    <title>Another Book</title> 
    <author>Another Author</author> 
    <genre>Science Fiction</genre> 
    <chapter position="first"> 
      <chaptitle>chapter one</chaptitle> 
      <text><![CDATA[<i>Sit Dolor Amet...</i>]]></text> 
    </chapter> 
  </book> 
</library>

Basic XPath Queries

XPath is a syntax available for querying an XML document. In it’s simplest form, you define a path to the element you want. Using the XML document above, the following XPath query will return a collection of all the book elements present:

//library/book

That’s it. The two forward slashes indicate library is the root element of the document, and the single slash indicates book is a child. It’s pretty straight forward, no?

But what if you want to specify a particular book. Let’s say you want to return any books written by “An Author”. The XPath for that would be:

//library/book/author[text() = "An Author"]/..

You can use text() here in square braces to perform a comparison against the value of a node, and the trailing “/..” indicates we want the parent element (i.e. move back up the tree one node).

XPath queries can be executed using one of two functions: query() and evaluate(). Both perform the query, but the difference lies in the type of result they return. query() will always return a DOMNodeList whereas evaluate() will return a typed result if possible. For example, if your XPath query is to return the number of books written by a certain author rather than the actual books themselves, then query() will return an empty DOMNodeList. evaluate() will simply return the number so you can use it immediately instead of having to pull the data from a node.

Code and Speed Benefits with XPath

Let’s do a quick demonstration that returns the number of books written by an author. The first method we’ll look at will work, but doesn’t make use of XPath. This is to show you how it can be done without XPath and why XPath is so powerful.

<?php
public function getNumberOfBooksByAuthor($author) { 
    $total = 0;
    $elements = $this->domDocument->getElementsByTagName("author");
    foreach ($elements as $element) {
        if ($element->nodeValue == $author) {
            $total++;
        }
    }
    return $number;
}

The next method achieves the same result, but uses XPath to select just those books that are written by a specific author:

<?php
public function getNumberOfBooksByAuthor($author)  { 
    $query = "//library/book/author1/..";
    $xpath = new DOMXPath($this->domDocument);
    $result = $xpath->query($query); 
    return $result->length;
}

Notice how we this time we have removed the need for PHP to test against the value of the author. But we can go one step further still and use the XPath function count() to count the occurrences of this path.

<?php
public function getNumberOfBooksByAuthor($author)  { 
    $query = "count(//library/book/author1/..)";
    $xpath = new DOMXPath($this->domDocument);
    return $xpath->evaluate($query);
}

We’re able to retrieve the information we needed with only only line of XPath and there is no need to perform laborious filtering with PHP. Indeed, this is a much simpler and succinct way to write this functionality!

Notice that evaluate() was used in the last example. This is because the function count() returns a typed result. Using query() will return a DOMNodeList but you will find that it is an empty list.

Not only does this make your code cleaner, but it also comes with speed benefits. I found that version 1 was 30% faster on average than version 2 but version 3 was about 10 percent faster than version 2 (about 15% faster than version 1). While these measurements will vary depending on your server and query, using XPath in it’s purest form will generally yield a considerable speed benefit as well as making your code easier to read and maintain.

XPath Functions

There are quite a few functions that can be used with XPath and there are many excellent resources which detail what functions are available. If you find that you are iterating over DOMNodeLists or comparing nodeValues, you will probably find an XPath function that can eliminate a lot of the PHP coding.

You’ve already see how count() functions. Let’s use the id() function to return the titles of the books with the given ISBNs. The XPath expression you will need to use is:

id("isbn1234 isbn1235")/title

Notice here that the values you are searching for are enclosed within quotes and delimited with a space; there is no need for a comma to delimit the terms.

<?php
public function findBooksByISBNs(array $isbns) { 
    $ids = join(" ", $isbns);
    $query = "id('$ids')/title"; 

    $xpath = new DOMXPath($this->domDocument); 
    $result = $xpath->query($query); 

    $books = array();
    foreach ($result as $node) {
        $book = array("title" => $booknode->nodeValue);
        $books[] = $book;
    }
    return $books; 
}

Executing complex functions in XPath is relatively simple; the trick is to become familiar with the functions that are available.

Using PHP Functions With XPath

Sometimes you may find that you need some greater functionality that the standard XPath functions cannot deliver. Luckily, PHP DOM also allows you to incorporate PHP’s own functions into an XPath query.

Let’s consider returning the number of words in the title of a book. In it’s simplest function, we could write the method as follows:

<?php
public function getNumberOfWords($isbn) {
    $query = "//library/book[@isbn = '$isbn']"; 

    $xpath = new DOMXPath($this->domDocument); 
    $result = $xpath->query($query); 

    $title = $result->item(0)->getElementsByTagName("title")
        ->item(0)->nodeValue; 

    return str_word_count($title); 
}

But we can also incorporate the function str_word_count() directly into the XPath query. There are a few steps that need to be completed to do this. First of all, we have to register a namespace with the XPath object. PHP functions in XPath queries are preceded by “php:functionString” and then the name of the function function you want to use is enclosed in parentheses. Also, the namespace to be defined is http://php.net/xpath. The namespace must be set to this; any other values will result in errors. We then need to call registerPHPFunctions() which tells PHP that whenever it comes across a function namespaced with “php:”, it is PHP that should handle it.

The actual syntax for calling the function is:

php:functionString("nameoffunction", arg, arg...)

Putting this all together results in the following reimplementation of getNumberOfWords():

<?php
public function getNumberOfWords($isbn) {
    $xpath = new DOMXPath($this->domDocument);

    //register the php namespace
    $xpath->registerNamespace("php", "http://php.net/xpath"); 

    //ensure php functions can be called within xpath
    $xpath->registerPHPFunctions();

    $query = "php:functionString('str_word_count',(//library/book[@isbn = '$isbn']/title))"; 

    return $xpath->evaluate($query); 
}

Notice that you don’t need to call the XPath function text() to provide the text of the node. The registerPHPFunctions() method does this automatically. However the following is just as valid:

php:functionString('str_word_count',(//library/book[@isbn = '$isbn']/title[text()]))

Registering PHP functions is not restricted to the functions that come with PHP. You can define your own functions and provide those within the XPath. The only difference here is that when defining the function, you use “php:function” rather than “php:functionString”. Also, it is only possible to provide either functions on their own or static methods. Calling instance methods are not supported.

Let’s use a regular function that is outside the scope of the class to demonstrate the basic functionality. The function we will use will return only books by “George Orwell”. It must return true for every node you wish to include in the query.

<?php
function compare($node) {
    return $node[0]->nodeValue == "George Orwell";
}

The argument passed to the function is an array of DOMElements. It is up to the function to iterate through the array and determine whether the node being tested should be returned in the DOMNodeList. In this example, the node being tested is /book and we are using /author to make the determination.

Now we can create the method getGeorgeOrwellBooks():

<?php
public function getGeorgeOrwellBooks() { 
    $xpath = new DOMXPath($this->domDocument); 
    $xpath->registerNamespace("php", "http://php.net/xpath"); 
    $xpath->registerPHPFunctions(); 

    $query = "//library/book1"; 
    $result = $xpath->query($query); 

    $books = array(); 
    foreach($result as $node) { 
        $books[] = $node->getElementsByTagName("title")
            ->item(0)->nodeValue; 
    } 

    return $books;
}

If compare() were a static method, then you would need to amend the XPath query so that it reads:

//library/book[php:function('Library::compare', author)]

In truth, all of this functionality can be easily coded up with just XPath, but the example shows how you can extend XPath queries to become more complex.

Calling an object method is not possible within XPath. If you find you need to access some object properties or methods to complete the XPath query, the best solution would be to do what you can with XPath and then work on the resulting DOMNodeList with any object methods or properties as necessary.

Summary

XPath is a great way of cutting down the amount of code you have to write and to speed up the execution of the code when working with XML data. Although not part of the official DOM specification, the additional functionality that the PHP DOM provides allows you to extend the normal XPath functions with custom functionality. This is a very powerful feature and as your familiarity with XPath functions increase you may find that you come to rely on this less and less.

Image via Fotolia

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • Chris Walsh

    Hi,
    You said “I found that version 1 was 30% faster on average than version 2 but version 3 was about 10 percent faster than version 2 (about 15% faster than version 1).”. As version 1 is the non-XPath approach, how is XPath faster?
    Thanks for the article however. I really like XPath but not used it in PHP.

  • http://augustowebd.blogspot.com augustowebd

    very nice job.
    Thanks so much.