Bending XML to Your Will

If you’ve ever worked with the Twitter or Facebook APIs, looked at RSS feeds from a website, or made use of some type of RPC calls, you’ve undoubtedly experienced working with XML. Extensible Markup Language (XML) is a big building block of today’s web with hundreds of XML-based languages having been developed, including XHTML, ATOM, and SOAP just to name a few. I myself have to work with quite a few third-party systems to send and receive data and the preferred method for all of them is XML.

Knowing how to process XML data is a crucial programming skill today, and thankfully, PHP offers multiple ways to read, filter, and even generate XML. In this article I’ll explain what exactly XML is, in case you haven’t had any experience with it yet, and then dive into a few ways you can use PHP to bend XML to your will.

What does XML do?

The short answer to the question “What does XML do?” is nothing. It does nothing at all. XML is simply a markup language, similar to HTML. Whereas HTML was designed to display data, however, XML was designed to provide a structured way to transport and store data.

Let’s take a look at a simple XML example that contains information on particular sports teams:

<?xml version="1.0" encoding="UTF-8" ?>
<roster>
 <team>
   <name>Bengals</name>
   <division>AFC North</division>
   <colors>Black and Orange</colors>
   <stadium location="Cincinnati">Paul Brown Stadium</stadium>
   <coach>Marvin Lewis</coach>
 </team>
 <team>
  <name>Titans</name>
  <division>AFC South</division>
  <colors>Blue and White</colors>
  <stadium location="Tennessee">LP Field</stadium>
  <coach>Mike Munchak</coach>
 </team>
</roster>

As you can see from the example, XML is human-readable and is self descriptive. Unlike HTML, XML has no predefined tags, allowing you to invent your own. Anyone, whether they are a programmer or not, can look at this example and understand the data. The software that you create has the job to write or parse the information from the XML document.

Sharing information between various platforms, databases, and programming languages can be a frustrating endeavor, but since XML is just a plain text file, it allows your data to be independent from the software in use. Because XML is such a wide-spread standard, it also gives you the freedom to develop your application without worrying about incompatibility on the other end.

If you’re still a bit shaky on XML and what it’s place in web development is, take a look at this great introduction to XML, A Really, Really, Really Good Introduction to XML.

Types of XML Parsers

There are two basic types of XML parsers: tree-based parsers and event-based parsers (sometimes called stream parsers). Tree-based parsers read the entire XML document into memory, structures the data into a tree-like format, and allows you access to the tree elements. Event-based parsers on the other hand read in XML and raises an event every time it reaches a new start or end tag. This allows you to apply a function pertinent to you application when an event occurs for a specific element. Since you are not storing the entire XML document in memory, event-based parsers are generally faster and less-resource intensive than the tree-based ones. Tree-based parsers are generally easier to use and require less code.

PHP 5 has a plethora of tools to choose from that work with XML, including the XML Parser (a.k.a. SAX or Expat Parser), DOM, SimpleXML, XMLReader, XMLWriter, and the XSL extensions. For the sake of brevity I’ll look at just two of the most widely used parsers, the XML Parser and SimpleXML extensions, which coincidently is one of each type of parser.

Using the XML Parser Extension

The first example I’ll show you involves using the XML Parser extension, an event-based parser. To start, let’s use the same XML example from earlier and parse it with the extension. Imagine you have been given the task to parse the XML into a simple list to display on a web page. Create the file nfl.xml with the the example XML as its contents.

Create another file called xmlParserExample.php with the following code:

<?php
$xmlFile = "nfl.xml";

$parser = xml_parser_create();
xml_parser_set_option($parser, XML_OPTION_CASE_FOLDING, false);
xml_set_element_handler($parser, array(NFLParser, "openTag"),
    array(NFLParser, "closeTag"));
xml_set_character_data_handler($parser,
    array(NFLParser, "characterData"));

$fp = fopen($xmlFile, "r");
while ($data = fread($fp, 4096)) {
    xml_parse($parser, $data, feof($fp))
        or die (sprintf("XML Error: %s at line %d",
            xml_error_string(xml_get_error_code($parser)), 
            xml_get_current_line_number($parser)));
}
xml_parser_free($parser);

class NFLParser {
    protected static $element;
    protected static $attrs;

    public static function openTag($parser, $elementName, $elementAttrs) {
        self::$element = $elementName;
        self::$attrs = $elementAttrs;

        switch($elementName) {
            case "team":
                echo "<ul>";
                break;
            case "division":
                echo "<li>Division: ";
                break;
            case "name":
                echo "<li>Team Name: ";
                break;
            case "colors":
                echo "<li>Team Colors: ";
                break;
            case "stadium":
                echo "<li>Stadium: ";
                break;
            case "coach":
                echo "<li>Head Coach: ";
        }
    }

    public static function closeTag($parser, $elementName) {
        self::$element = null;
        self::$attrs = null;
    
        if ($elementName == "team") {
            echo "</ul>";
        }
        elseif($elementName != "roster") {
            echo "</li>";
        }
    }

    public static function characterData($parser, $data) {
        echo $data;
        if (self::$element == "stadium") {
            echo " (" . self::$attrs["location"] . ")";
        }
    }
}

The xml_parser_create() function creates a new XML parser handler that is used throughout the code. The next function, xml_parser_set_option(), is used to set options for the parser. In this case, the XML_OPTION_CASE_FOLDING option is set to false (since it is set to true by default). Case folding is a the process applied to a sequence of characters in which they are all converted to uppercase. By setting this option to true I can preserve the case sensitivity of tags exactly how they appear in the XML file.

The xml_set_element_handler() function sets the parser’s start and end element handlers. This function accepts three parameters: the first parameter is the parser reference, the second parameter is the callback function that will handle opening tags (the static openTag() method of the NFLParser class in the example), and the third parameter is the callback that will handle closing tags (the closeTag() method).

PHP passes three parameters to openTag(): the parser, the name of the element for which this handler is called, and an associative array of any attributes for the element. Two parameters are provided to closeTag(): the parser and the name of the element.

The xml_set_character_data_handler() function specifies the function that will handle character data for an element. The function accepts two parameters: the parser and the name of the callback function which, in this example, is the static characterData() method. The characterData() method is passed two parameters: the parser, and the character data from the element.

The remaining bit of code in the example reads in the XML file and calls the xml_parse() function which starts the parsing process. xml_parse() accepts three parameters: the parser, a chunk of data to parse, and a boolean parameter which indicates whether it is the last piece of data.

The last function called is xml_parser_free(); just like in file handling, it is always a good idea to free up the reference handle when you’re finished.

I chose to encapsulate the methods in the class NFLParser so I could track the current element and attributes being parsed in $element and $attrs without them polluting the global namespace and make them available to the characterData() method.

Execute your script and you should have a nice HTML list of all the data from the XML.

<ul>
 <li>Team Name: Titans</li>
 <li>Team Colors: Blue and White</li>
 <li>Stadium: LP Field (Nashville)</li>
 <li>Head Coach: Mike Munchak</li>
</ul>
<ul>
 <li>Team Name: Bengals</li>
 <li>Team Colors: Black and Orange</li>
 <li>Stadium: Paul Brown Stadium (Cincinnati)</li>
 <li>Head Coach: Marvin Lewis</li>
</ul>

Well that wasn’t too bad interpreting XML with PHP using the event-driven parser, but what if there was an even easier way to slice up XML, a simpler way if you will?

Using SimpleXML

The SimpleXML extension was introduced in PHP 5 and takes a lot of the tedium of XML manipulation away. SimpleXML is a tree-based object-oriented parser, so it’s a slower and more resource-intensive way to parse XML, but any speed lost using this extension will be long forgotten once you see how “simple” it truly is to use.

Create a file called simpleXMLExample.php and enter the code below:

<?php
$xmlFile = "nfl.xml";

$xml = simplexml_load_file($xmlFile);

foreach($xml->team as $element){
    $attr = $element->stadium->attributes();
    $location = $attr->location;

    echo "<ul>n";
    echo " <li>Division:" . $element->division . "</li>n";
    echo " <li>Team Name:" . $element->name . "</li>n";
    echo " <li>Team Colors:" . $element->color . "</li>n";
    echo " <li>Stadium:" . $element->stadium ." (" . $location. ")</li>n";
    echo " <li>Coach" . $element->coach . "</li>n";
    echo "</ul>n";
}

Executing this script will produce the same output but without the need to write much of the parsing code.

You might be wondering why would you use an extension like XML Parser if SimpleXML is so… well, simple? I liken this question to a construction worker that goes to his job with only a hammer in his belt. Sure he’ll get by hammering nails for awhile, but what eventually he’ll be faced with a screw. Even though one tool might be easier to use, it doesn’t make it the ideal choice for every situation.

Summary

In this article you learned a little bit about XML and how it’s used around the web. More importantly, though, you learned about the two basic types of XML parsers, tree-based and event-based parsers. PHP offers several different XML parsing extensions, two of which are XML Parser and SimpleXML. Each offers trade-offs with performance, ease of use, and the amount of code the programmer needs to write. Hopefully seeing how both extensions are used will help you confidently choose the best approach the next time you need to consume XML.

Image via Ken Durden/Shutterstock

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • http://www.softxml.com/SoftXPathDemo.htm Gregory

    SoftXPath – small cross browser JavaScript/XML library written for web developers who deals with XML parsing/querying on client side. With the help of SoftXPath you will be able to query complex XML documents using powerful Xpath expressions. Now you can focus on building effective Xpath expressions instead of wasting time on browser compatibility issues.

  • Justino

    Thank you for the article. When parsing XML with colons in the namespace, how does that work to parse that with the two PHP methods you’re showing? Thank you.

  • http://www.farinspace.com Dimas

    I think the biggest issue I’ve had with XML has been the use of namespaces … I’ve had complex project with multiple namespaces. All communication was done via XML, so not only did I need to read the XML but also write it. Many XML tools fall down when it comes to dealing with namespaces, in my current experience I have not found a tool yet that does a great job dealing with namespaces in a easy fashion. However QueryPath has come very close in helping to deal with both reading and writing woes when it comes to XML.

    • Sandeep.C.R

      This is a class I ve wrote to deal with namespaces in xml. I have explained in details in this thread.
      http://forums.devnetwork.net/viewtopic.php?f=50&t=127475
      Basically this class contains a search method that can generate php statements that should be used to access a node in XML.

  • http://meta-blogger.com Michael Hall

    many thanks for this article, i think it came right in the nick of time for a project i’m working on now.

  • http://popsypedia.org Barida Popsana

    This is the Best Tutorial I’ve seen this year..
    Nice Exposition to XML.

  • http://meta-blogger.com Michael Hall

    I’ve found a sweet tutorial that shows you how to use php to extract the source of
    pretty much any web page and then echo the results over to jquery, so you can essentially
    copy the source of a page that you don’t own via php and use jquery to select just the images or
    h2 headings or links, etc as if it was on your own server – i.e. cross domain access.
    http://www.sitegrind.nl/jquery/jquery-load-function-get-content-from-other-websites/

    what’s really cool is it uses very little code and because my webserver is essentially pulling
    the source from their server i can actually use it to access sites that would otherwise be blocked
    at my place of work based on the domain.

    don’t use it illegally

    • http://WebsiteURL Michael Hall

      oops, i accidentally posted the info above, please remove it was meant for another article i was reading