Parsing XML With SimpleXML

    Sandeep Panda
    Sandeep Panda
    Share

    Parsing XML essentially means navigating through an XML document and returning the relevant data. An increasing number of web services return data in JSON format, but a large number still return XML, so you need to master parsing XML if you really want to consume the full breadth of APIs available. Using PHP’s SimpleXML extension that was introduced back in PHP 5.0, working with XML is very easy to do. In this article I’ll show you how.

    Basic Usage

    Let’s start with the following sample as languages.xml:
    <?xml version="1.0" encoding="utf-8"?>
    <languages>
     <lang name="C">
      <appeared>1972</appeared>
      <creator>Dennis Ritchie</creator>
     </lang>
     <lang name="PHP">
      <appeared>1995</appeared>
      <creator>Rasmus Lerdorf</creator>
     </lang>
     <lang name="Java">
      <appeared>1995</appeared>
      <creator>James Gosling</creator>
     </lang>
    </languages>
    The above XML document encodes a list of programming languages, giving two details about each language: its year of implementation and the name of its creator. The first step is to loading the XML using either simplexml_load_file() or simplexml_load_string()
    . As you might expect, the former will load the XML file a file and the later will load the XML from a given string.
    <?php
    $languages = simplexml_load_file("languages.xml");
    Both functions read the entire DOM tree into memory and returns a SimpleXMLElement object representation of it. In the above example, the object is stored into the $languages variable. You can then use var_dump() or print_r()
    to get the details of the returned object if you like.
    SimpleXMLElement Object
    (
        [lang] => Array
            (
                [0] => SimpleXMLElement Object
                    (
                        [@attributes] => Array
                            (
                                [name] => C
                            )
                        [appeared] => 1972
                        [creator] => Dennis Ritchie
                    )
                [1] => SimpleXMLElement Object
                    (
                        [@attributes] => Array
                            (
                                [name] => PHP
                            )
                        [appeared] => 1995
                        [creator] => Rasmus Lerdorf
                    )
                [2] => SimpleXMLElement Object
                    (
                        [@attributes] => Array
                            (
                                [name] => Java
                            )
                        [appeared] => 1995
                        [creator] => James Gosling
                    )
            )
    )
    The XML contained a root language element which wrapped three lang elements, which is why the SimpleXMLElement has the public property lang
    which is an array of three SimpleXMLElements. Each element of the array corresponds to a lang element in the XML document. You can access the properties of the object in the usual way with the -> operator. For example, $languages->lang[0] will give you a SimpleXMLElement
    object which corresponds to the first lang element. This object then has two public properties: appeared and creator.
    <?php
    $languages->lang[0]->appeared;
    $languages->lang[0]->creator;
    Iterating through the list of languages and showing their details can be done very easily with standard looping methods, such as foreach
    .
    <?php
    foreach ($languages->lang as $lang) {
        printf(
            "<p>%s appeared in %d and was created by %s.</p>",
            $lang["name"],
            $lang->appeared,
            $lang->creator
        );
    }
    Notice that I accessed the lang element’s name attribute to retrieve the name of the language. You can access any attribute of an element represented as a SimpleXMLElement object using array notation like this.

    Dealing With Namespaces

    Many times you’ll encounter namespaced elements while working with XML from different web services. Let’s modify our languages.xml
    example to reflect the usage of namespaces:
    <?xml version="1.0" encoding="utf-8"?>
    <languages
     xmlns:dc="http://purl.org/dc/elements/1.1/">
     <lang name="C">
      <appeared>1972</appeared>
      <dc:creator>Dennis Ritchie</dc:creator>
     </lang>
     <lang name="PHP">
      <appeared>1995</appeared>
      <dc:creator>Rasmus Lerdorf</dc:creator>
     </lang>
     <lang name="Java">
      <appeared>1995</appeared>
      <dc:creator>James Gosling</dc:creator>
     </lang>
    </languages>
    Now the creator element is placed under the namespace dc which points to http://purl.org/dc/elements/1.1/. If you try to print the creator of a language using our previous technique, it won’t work. In order to read namespaced elements like this you need to use one of the following approaches. The first approach is to use the namespace URI directly in your code when accessing namespaced elements. The following example demonstrates how:
    <?php
    $dc = $languages->lang[1]- >children("http://purl.org/dc/elements/1.1/");
    echo $dc->creator;
    The children() method takes a namespace and returns the children of the element that are prefixed with it. It accepts two arguments; the first one is the XML namespace and the latter is an optional Boolean which defaults to false. If you pass true, the namespace will be treated as a prefix rather the actual namespace URI. The second approach is to read the namespace URI from the document and use it while accessing namespaced elements. This is actually a cleaner way of accessing elements because you don’t have to hardcode the URI.
    <?php
    $namespaces = $languages->getNamespaces(true);
    $dc = $languages->lang[1]->children($namespaces["dc"]);
    
    echo $dc->creator;
    The getNamespaces() method returns an array of namespace prefixes with their associated URIs. It accepts an optional parameter which defaults to false. If you set it true then the method will return the namespaces used in parent and child nodes. Otherwise, it finds namespaces used within the parent node only. Now you can iterate through the list of languages like so:
    <?php
    $languages = simplexml_load_file("languages.xml");
    $ns = $languages->getNamespaces(true);
    
    foreach($languages->lang as $lang) {
        $dc = $lang->children($ns["dc"]);
        printf(
            "<p>%s appeared in %d and was created by %s.</p>",
            $lang["name"],
            $lang->appeared,
            $dc->creator
        );
    }

    A Practical Example – Parsing YouTube Video Feed

    Let’s walk through an example that retrieves the RSS feed from a YouTube channel displays links to all of the videos from it. For this we need to make a call to the following URL: http://gdata.youtube.com/feeds/api/users//uploads The URL returns a list of the latest videos from the given channel in XML format. We’ll parse the XML and get the following pieces of information for each video:
    • Video URL
    • Thumbnail
    • Title
    We’ll start out by retrieving and loading the XML:
    <?php
    $channel = "channelName";
    $url = "http://gdata.youtube.com/feeds/api/users/".$channel."/uploads";
    $xml = file_get_contents($url);
    
    $feed = simplexml_load_string($xml);
    $ns=$feed->getNameSpaces(true);
    If you take a look at the XML feed you can see there are several entity elements each of which stores the details of a specific video from the channel. But we are concerned with only thumbnail image, video URL, and title. The three elements are children of group
    , which is a child of entry:
    <entry>
       …
       <media:group>
          …
          <media:player url="video url"/>
          <media:thumbnail url="video url" height="height" width="width"/>
          <media:title type="plain">Title…</media:title>
          …
       </media:group>
       …
    </entry>
    We simply loop through all the entry elements, and for each one we can extract the relevant information. Note that player, thumbnail
    , and title are all under the media namespace. So, we need to proceed like the earlier example. We get the namespaces from the document and use the namespace while accessing the elements.
    <?php
    foreach ($feed->entry as $entry) {
    	$group=$entry->children($ns["media"]);
    	$group=$group->group;
    	$thumbnail_attrs=$group->thumbnail[1]->attributes();
    	$image=$thumbnail_attrs["url"];
    	$player=$group->player->attributes();
    	$link=$player["url"];
    	$title=$group->title;
    	printf('<p><a href="%s"><img src="%s" alt="%s"></a></p>',
    	        $player, $image, $title);
    }

    Conclusion

    Now that you know how to use SimpleXML to parse XML data, you can improve your skills by parsing different XML feeds from various APIs. But an important point to consider is that SimpleXML reads the entire DOM into memory, so if you are parsing large data sets then you may face memory issues. In those cases it’s advisable to use something other than SimpleXML, preferably an event-based parser such as XML Parser. To learn more about SimpleXML, check out its documentation
    . And if you enjoyed reading this post, you’ll love Learnable; the place to learn fresh skills and techniques from the masters. Members get instant access to all of SitePoint’s ebooks and interactive online courses, like Jump Start PHP. Comments on this article are closed. Have a question about PHP? Why not ask it on our forums?

    Frequently Asked Questions (FAQs) about Parsing XML with SimpleXML

    What is SimpleXML in PHP?

    SimpleXML is a PHP extension that allows us to easily manipulate and get XML data. It converts the structure of an XML document into an object that can be processed with normal property selectors and array iterators. This makes it easier to read, parse, and manipulate XML files. SimpleXML is part of the PHP core, so there’s no need to install anything to use it.

    How do I install SimpleXML in PHP?

    SimpleXML is enabled by default in PHP versions 5.1.0 and later. If you’re using an older version of PHP, you may need to install it manually. You can do this by recompiling PHP with the –enable-simplexml configure option. However, it’s generally recommended to use a more recent version of PHP if possible.

    How do I read an XML file with SimpleXML?

    To read an XML file with SimpleXML, you can use the simplexml_load_file() function. This function takes the path to your XML file as an argument and returns an object that represents the XML document. You can then access the elements of the XML file as properties of this object.

    How do I access attributes with SimpleXML?

    Attributes in an XML element can be accessed as properties of the SimpleXML object. For example, if you have an element , you can access the title attribute with $book->title.

    How do I convert a SimpleXML object to a string?

    You can convert a SimpleXML object to a string using the asXML() function. This function returns a well-formed XML string, including the declaration.

    How do I handle namespaces with SimpleXML?

    SimpleXML can handle namespaces using the children() and attributes() methods. These methods take the namespace URI as an argument and return the child elements or attributes in that namespace.

    How do I add elements to a SimpleXML object?

    You can add elements to a SimpleXML object using the addChild() method. This method takes the name and value of the new element as arguments and adds it to the object.

    How do I remove elements from a SimpleXML object?

    Removing elements from a SimpleXML object is a bit trickier, as there’s no built-in method for this. However, you can use the unset() function to remove elements.

    How do I handle errors with SimpleXML?

    SimpleXML uses PHP’s error handling functions to handle errors. You can use the libxml_use_internal_errors() function to suppress errors and libxml_get_errors() to get an array of errors.

    Can I use XPath with SimpleXML?

    Yes, you can use XPath with SimpleXML. The xpath() method allows you to run XPath queries on a SimpleXML object and returns an array of matching elements.