Instant XML with PHP and PEAR::XML_Serializer

These days, XML has become part of landscape in most all areas of software development — none more so than on the Web. Those using common XML applications, such as RSS and XML-RPC, will probably find public domain libraries geared specifically to help them work with the formats, eliminating the need for wheel re-invention.

But for "ad-hoc" XML documents, you may be on your own, and you may well wind up spending valuable time building code to parse it. You may also find yourself needing to expose data as XML, in order to make it available to some other system or application, and while XML, in the end, is just text, generating a document that obeys XML’s rules for well-formedness can be trickier than it seems. Enter: PEAR::XML_Serializer, the "Swiss Army Knife" for XML.

If you stay in touch with SitePoint, you’ve already had a taste of PEAR::XML_Serializer while reading Getting Started with PEAR. In this article, I’ll be looking at XML_Serializer in depth and showing you how it can make working with XML a snap. If you’re in any doubt about XML in general, try an Introduction to XML.

Today’s tag hierarchy:

  • Introduction: what PEAR::XML_Serializer does and how to install it
  • The XML_Serializer API: overview of the serialization class with simple examples
  • The XML_Unserializer API: overview of the unserialization class with more examples
  • Managing Configuration Information: PEAR::XML_Serializer applied to manage an XML configuration file
  • Web Services with PEAR::XML_Serializer: System to system data exchange

Note that the version of PEAR::XML_Serializer used for this article was 0.91. To make life easy, I’ve saved all the code from these examples into an archive you can download here.

Introduction

PEAR::XML_Serializer is a result of the hard work of Stephan Schmidt, one of Germany’s most prolific PHP developers. There’s a reasonable chance you’ve already run into a PHP project that Stephan has worked on, if you’ve ever looked at PHP Application Tools (PAT) (as in patTemplate, patUser and many more). In fact, if you look at the publications and presentations, you may find yourself wondering if Stephan has somehow managed to clone himself.

PEAR::XML_Serializer works on the principle that XML can be represented as native PHP types (variables). In other words, you can build some array in PHP, pass it to XML_Serializer, and it will give you back an XML document that represents the array. It’s also capable of the reverse transformation — give it an XML document and it can unserialize it for you, returning a PHP data structure representing the document.

The magic behind the scenes is PHP’s reflection functions, such as is_array(), get_class() and get_object_vars(). According to Wikipedia, "reflection is the ability of a program to examine and possibly modify its high level structure at runtime".

What this means is that PEAR::XML_Serializer, given some arbitrary PHP data structure, does a pretty good job of turning it into a useful XML representation and vice versa. Of course, this is based on "guesswork" and you may find the resulting transformations aren’t always quite what you expected. To give you more control, PEAR::XML_Serializer has a number of runtime options that affect how it makes transformations. I’ll look at the class APIs and summarize the available options in a moment.

Some likely problems to which you might apply PEAR::XML_Serializer include managing application configuration with an XML document (config.xml), building REST-based Web services, storing data in XML for later recall by your applications, general system-to-system data exchange and pretty much any "quick and dirty" parsing you need to do at short notice.

Where you might want to avoid using PEAR::XML_Serializer is in parsing large XML documents (in the order of megabytes), or when you’re dealing with complex, possibility arbitrary XML documents (such as XHTML). Like the DOM API, XML_Serializer parses the entire XML document and builds a PHP data structure from it, in memory. Large documents may result in you hitting PHP’s memory_limit (see php.ini), and operations like looping though the data structure will be expensive. PHP’s native SAX parser is generally a better choice in such cases, allowing you to work with small "chunks" and keep memory use under control.

Meanwhile, for documents such as XHTML, the API PEAR::XML_Serializer is too simplistic to give you the degree of fine grained control you’ll require. Once you’re familiar with how to use it, try unserializing SitePoint’s homepage, or generating XHTML by serializing a PHP data structure, and you’ll quickly see what I mean. DOM is generally a better choice for manipulating XML.

To use PEAR::XML_Serializer, you also need to have PEAR::XML_Parser (a wrapper on PHP’s SAX extension) and PEAR::XML_Util (provides a number of handy methods for working with XML) installed. XML_Parser is frequently installed with PEAR itself but, assuming you have neither, type the following, from the command line, to get everything installed:

$ pear install XML_Parser 
$ pear install XML_Util
$ pear install XML_Serializer

Of course this assumes you have PEAR installed — see Getting Started with PEAR for instructions on installing PEAR.

PEAR::XML_Serializer provides two APIs with the classes XML_Serializer and XML_Unserializer. The first, XML_Serializer, is used to transform PHP data structures into XML, while XML_Unserializer performs the reverse operation, transforming XML into a PHP data structure. In both cases, only a few public class methods are exposed, making simple transformations quick coding. Further control over the behaviour of the classes requires setting "options", typically by passing an associative PHP array to the constructor of the class you’re working with. I have to confess I’m less than enamoured with handling configuration this way, as I’ve blogged before here, and PEAR::XML_Serializer perhaps proves the point; finding the supported options requires trawling the source code (to make your life easy, a complete list is coming right up). Anyway, griping aside, PEAR::XML_Serializer remains an excellent tool for working with XML.

The XML_Serializer API

I’ll begin with the XML_Serializer class, used to transform PHP data structures into XML, first summarizing the API, then illustrating with some basic examples. To describe the API, I’ll be using the function signature notion common to the PHP manual:

return_type function_name(type param_name, [type optional_param_name])

The main public methods available from the XML_Serializer class are:

  • object XML_Serializer([array options])
    The constructor accepts an optional array of options (see below).
  • mixed serialize(mixed data, [array options])
    Pass this method a PHP data structure and it performs the serialization into XML. The returned value is either TRUE on success, or a PEAR Error object if problems were encountered. Further options can be also be passed as a second argument (see below).
  • mixed getSerializedData()
    This method returns the serialized XML document as a string, or as a PEAR error object if there’s no serialized XML available.
  • void setOption(string name, mixed value)
    This method sets an individual option.
  • void resetOptions()
    Use this method to reset all options to their default states.

The available options for XML_Serializer are:

  • addDecl (default = FALSE):
    whether to add opening XML processing instruction, <?xml version="1.0"?>
  • encoding (default = ""):
    the XML character encoding that will be added to the opening XML declaration e.g. <?xml version="1.0" encoding="ISO-8859-1"?>
  • addDoctype (default = FALSE):
    whether to add a DOCTYPE declaration to the document
  • doctype (default = null):
    specify the URIs to be used in the DOCTYPE declaration (see examples below)
  • indent (default = ""):
    a string used to indent the XML tags, to make it friendlier to the human eye
  • linebreak (default = "n"):
    also used for formatting, this character being inserted after each opening and closing tag
  • indentAttributes (default = FALSE):
    a string used to indent the attributes of generated XML tags. If set to the special value, "_auto", it will line up all the attributes below the same column, inserting a linefeed character between each attribute.
  • defaultTagName (default = "XML_Serializer_Tag"):
    the tag name used to serialize the values in an indexed array
  • mode (default = "default"):
    if set to ‘simplexml’, the elements of indexed arrays will be placed in tags with the same name as their parent. More on this below.
  • rootName (default = ""):
    The tag to assign to the root tag of the XML document. If not specified, the type of the root element in the PHP data structure will be used for the root name (e.g. "array").
  • rootAttributes (default = array()):
    an associative array of values to be transformed into the attributes of the root tag, the keys becoming the attribute names. Be careful when using this, as it’s your responsibility to make sure the keys and values will make legal XML attributes.
  • scalarAsAttributes (default = FALSE):
    for associative arrays, if the values are scalar types (e.g. strings, integers), they will be assigned to their parent node as attributes, using the array key as the attribute name.
  • prependAttributes (default = ""):
    a string to be prepended to the names of any generated tag attributes.
  • typeHints (default = FALSE):
    determines whether the original variable type of the PHP value that a tag represents should be stored as an attribute in the serialized XML document. See below for an example.
  • typeAttribute (default = "_type"):
    if typeHints are being used, the types will be stored in the XML tag using an attribute with the name of this option. If you have a PHP variable like $myVariable = 'Hello World!', the default serialized XML representation would be <myVariable _type="string">Hello World!</myVariable> if typeHints are being used.
  • keyAttribute (default = "_originalKey"):
    attribute used to store the original key of indexed array elements. Used only when typeHints are on.
  • classAttribute (default = "_class"):
    when serializing objects (with typeHints on), this attribute will be used to store the name of the class the object was created from.

One further special option exists. ‘overrideOptions’ is used when passing options to the serialize() method. If assigned the value ‘TRUE’, the options passed to the constructor will be ignored in favour of the default option values and any further options passed to the serialize() method.

A simple example of serializing a PHP data structure with XML_Serializer is as follows:

<?php 
// Set error reporting to ignore notices
error_reporting(E_ALL ^ E_NOTICE);

// Include XML_Serializer
require_once 'XML/Serializer.php';

// Some data to transform
$palette = array('red', 'green', 'blue');

// An array of serializer options
$serializer_options = array (
   'addDecl' => TRUE,
   'encoding' => 'ISO-8859-1',
   'indent' => '  ',
   'rootName' => 'palette',
   'defaultTagName' => 'color',
);

// Instantiate the serializer with the options
$Serializer = &new XML_Serializer($serializer_options);

// Serialize the data structure
$status = $Serializer->serialize($palette);

// Check whether serialization worked
if (PEAR::isError($status)) {
   die($status->getMessage());
}

// Display the XML document
header('Content-type: text/xml');
echo $Serializer->getSerializedData();
?>

Filename: palette1.php

You can see here how the options are typically used. I need to build an array, $serializer_options, and pass it to the constructor of XML_Serializer.

Note that changing the error reporting is a requirement if you usually work with full error reporting turned on. The current version of PEAR::XML_Serializer throws PHP error notices like "array to string conversion", none of which is serious, but will result in error notice messages.

The resulting XML looks like this:

<?xml version="1.0" encoding="ISO-8859-1"?> 
<palette>
 <color>red</color>
 <color>green</color>
 <color>blue</color>
</palette>

Because the data structure is an indexed array, I used the 'defaultTagName' option to give a name to the tags representing the elements of the array.

Now, let’s use an associative array instead:

<?php 
// Set error reporting to ignore notices
error_reporting(E_ALL ^ E_NOTICE);

// Include XML_Serializer
require_once 'XML/Serializer.php';

// Some data to transform
$palette = array(
   'red' => 45,
   'green' => 240,
   'blue' => 120
   );

// An array of serializer options
$serializer_options = array (
   'addDecl' => TRUE,
   'encoding' => 'ISO-8859-1',
   'indent' => '  ',
   'rootName' => 'palette',
);

// Instantiate the serializer with the options
$Serializer = &new XML_Serializer($serializer_options);

// Serialize the data structure
$status = $Serializer->serialize($palette);

// Check whether serialization worked
if (PEAR::isError($status)) {
   die($status->getMessage());
}

// Display the XML document
header('Content-type: text/xml');
echo $Serializer->getSerializedData();
?>

Filename: palette2.php

And the resulting XML is as follows:

<?xml version="1.0" encoding="ISO-8859-1"?>  
<palette>  
 <red>45</red>  
 <green>240</green>  
 <blue>120</blue>  
</palette>

Notice that the tag names now correspond to the keys of the $palette variable. If I use this example again, adding the 'scalarAsAttributes' option (see example file palette3.php), here’s the XML I get back:

<?xml version="1.0" encoding="ISO-8859-1"?>  
<palette blue="120" green="240" red="45" />

The scalar integer values now become attributes of the root tag rather than being represented as separate tags.

Another example shows what happens when you serialize objects and take advantage of the ‘typeHints’ option:

<?php  
// Set error reporting to ignore notices  
error_reporting(E_ALL ^ E_NOTICE);  
 
// Include XML_Serializer  
require_once 'XML/Serializer.php';  
 
// A class to store color information  
class ColorInformation {  
   var $hue;  
   var $value;  
   function ColorInformation($hue = NULL, $value = NULL) {  
       $this->hue = $hue;  
       $this->value = $value;  
   }  
}  
 
// Some data to transform  
$palette = array();  
$palette[] = &new ColorInformation('red', 45);  
$palette[] = &new ColorInformation('green', 240);  
$palette[] = &new ColorInformation('blue', 120);  
 
// An array of serializer options  
$serializer_options = array (  
   'addDecl' => TRUE,  
   'encoding' => 'ISO-8859-1',  
   'indent' => '  ',  
   'indentAttributes' => '_auto',  
   'rootName'  => 'palette',  
   'defaultTagName' => 'color',  
   'typeHints' => TRUE,  
);  
 
// Instantiate the serializer with the options  
$Serializer = &new XML_Serializer($serializer_options);  
 
// Serialize the data structure  
$status = $Serializer->serialize($palette);  
 
// Check whether serialization worked  
if (PEAR::isError($status)) {  
   die($status->getMessage());  
}  
 
// Display the XML document  
header('Content-type: text/xml');  
echo $Serializer->getSerializedData();  
?>

Filename: palette4.php

The corresponding XML looks like this:

<?xml version="1.0" encoding="ISO-8859-1"?>  
<palette _type="array">  
 <color _class="colorinformation"  
        _originalKey="0"  
        _type="object">  
   <hue _type="string">red</hue>  
   <value _type="integer">45</value>  
 </color>  
 <color _class="colorinformation"  
        _originalKey="1"  
        _type="object">  
   <hue _type="string">green</hue>  
 
   <value _type="integer">240</value>  
 </color>  
 <color _class="colorinformation"  
        _originalKey="2"  
        _type="object">  
   <hue _type="string">blue</hue>  
   <value _type="integer">120</value>  
 </color>  
</palette>

With 'typeHints' switched on, attributes are added that describe the original data structure is some detail, '_type' referring to the original PHP variable type, '_class' storing the class name of any serialized objects, and '_originalKey' being the key of the indexed array in which this element was found, if applicable.

The 'typeHints' functionality is useful when you need to make sure that when you unserialize the document, you get back exactly what you started with. You might want to use 'typeHints' if you’re using PEAR::XML_Serializer to store persistent data that your code will retrieve later. Note that when you need a precise representation of a PHP data structure using typeHints, it’s a good idea to avoid using the 'scalarAsAttributes' option, which loses information about scalar types.

Finally, the following example shows how you can add DOCTYPE declarations, in this example, to render XHTML:

<?php  
// Set error reporting to ignore notices  
error_reporting(E_ALL ^ E_NOTICE);  
 
// Include XML_Serializer  
require_once 'XML/Serializer.php';  
 
// PHP data structure representing an XHTML document  
$xhtml = array  
   (  
       'head' => array (  
           'title' => 'XHTML with XML_Serializer',  
       ),  
       'body' => array (  
           'h1' => 'XHTML with XML_Serializer',  
           'p' => 'It's possible but not recommended',  
       ),  
   );  
 
// XML_Serializer options  
$serializer_options = array (  
   'addDecl' => TRUE,  
   'encoding' => 'ISO-8859-1',  
   'indent' => '  ',  
   'rootName' => 'html',  
   'rootAttributes' => array (  
       'xmlns' => 'http://www.w3.org/1999/xhtml',  
       'lang' => 'en',  
       'xml:lang' => 'en'  
       ),  
   'addDoctype' => TRUE,  
   'doctype' => array (  
       'id' => '-//W3C//DTD XHTML 1.0 Strict//EN',  
       'uri' => 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd'  
   ),  
);  
 
// Create the serializer  
$Serializer = &new XML_Serializer($serializer_options);  
 
// Serialize the XHTML  
$status = $Serializer->serialize($xhtml);  
 
// Check for errors  
if (PEAR::isError($status)) {  
   die($status->getMessage());  
}  
 
// Send the right HTTP header  
// See http://www.juicystudio.com/tutorial/xhtml/mime.asp for more info  
if (stristr($_SERVER[HTTP_ACCEPT], 'application/xhtml+xml')) {  
   header('Content-Type: application/xhtml+xml; charset=ISO-8859-1');  
} else {  
   header('Content-Type: text/html; charset=ISO-8859-1');  
}  
 
// Display the XML document  
echo $Serializer->getSerializedData();  
?>

Filename: xhtml.php

Notice the 'doctype' option. The array defined here is actually determined by PEAR::XML_Util in its getDocTypeDeclaration() method. Notice also that I used the 'rootAttributes' option to add attributes to the root html tag. Here’s the resulting XHTML:

<?xml version="1.0" encoding="ISO-8859-1"?>  
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"  
   "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">  
<html lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">  
 <head>  
   <title>XHTML with XML_Serializer</title>  
 </head>  
 <body>  
   <h1>XHTML with XML_Serializer</h1>  
   <p>It's possible but not recommended</p>  
 
 </body>  
</html>

This example is meant purely to show how DOCTYPE declarations can be used. In practice, I don’t recommend you generate XHTML with XML_Serializer, as the API does not provide you with fine grained control over the transformation, and you’re likely to spend a lot of time fiddling with giant PHP arrays and generally losing hair.

The XML_Unserializer API

You’ve seen the essentials of the XML_Serializer API; XML_Unserializer, which is used to transform XML to PHP data structures, is essentially its "mirror image".

The main public methods are:

  • object XML_Unserializer([array options])
    This constructor accepts an optional array of options (described below).
  • mixed unserialize(mixed data, [boolean isFile], [array options])
    The first argument, data, can be a string containing the XML document, the path to a file containing the XML (in which case the second argument isFile must be set to true) or a PHP resource implementing the steams API, such as a file you’ve already opened or an instance of one of PEAR’s stream wrappers. The third optional argument, options, is an array of options as described below. The value returned from unserialize() will either be TRUE on success or a PEAR Error object.
  • mixed getUnserializedData()
    This returns either the PHP data structure represented by the XML document or a PEAR Error object.
  • mixed getRootName()
    This method returns the name of the root element (typically invisible in the data structure returned from getUnserializedData()) or a PEAR Error object.
  • void setOption(string name, mixed value)
    Use this to set an individual option.
  • void resetOptions()
    This method resets all options to their default state.

The options available for use with XML_Unserializer are:

  • parseAttributes (default = FALSE)
    whether tag attributes should be turned into arrays; if set to false, attributes are ignored
  • attributesArray (default = FALSE)
    the name to use to identify the array into switch attributes will be placed
  • prependAttributes (default = "")
    use to prepend the keys of the array, into which attributes will be placed
  • complexType (default = "array")
    if no typeHint is found, complex types (tags containing a mix of CDATA and other child tags) will be converted to arrays
  • contentName (default = "_content")
    for complex types, CDATA inside the tag will be placed in the resulting array
  • tagMap (default = array())
    allows you to map an XML tag name to the name of a PHP class (see below for example); overrides the value of the classAttribute if there are typeHints in the XML document being unserialized
  • keyAttribute (default = "_originalKey")
    identifies the attribute used as the typeHint for the indexed array key (see the XML_Serializer option of the same name)
  • typeAttribute (default = "_type")
    identifies the PHP data type, used as the typeHint (see the XML_Serializer option of the same name)
  • classAttribute (default = "_class")
    identifies the PHP name of a PHP class that this tag represents an object of (see the XML_Serializer option of the same name)

As with the XML_Serializer class, the ‘overrideOptions’ option can be passed to the unserialize() method to ignore values already passed to the constructor.

To provide a basic illustration of XML_Unserializer, I’ll take the output from XML_Serializer examples you saw above and see what we get back when we unserialize them.

Here’s the reverse of the first example:

<?php   
// Set error reporting to ignore notices  
error_reporting(E_ALL ^ E_NOTICE);  
 
// Include XML_Unserializer  
require_once 'XML/Unserializer.php';  
 
// The XML document  
$doc = <<<EOD  
<?xml version="1.0" encoding="ISO-8859-1"?>  
<palette>  
 <color>red</color>  
 <color>green</color>  
 <color>blue</color>  
</palette>  
EOD;  
 
// Instantiate the serializer  
$Unserializer = &new XML_Unserializer();  
 
// Serialize the data structure  
$status = $Unserializer->unserialize($doc);  
 
// Check whether serialization worked  
if (PEAR::isError($status)) {  
   die($status->getMessage());  
}  
 
// Display the PHP data structure  
echo '<pre>';  
print_r($Unserializer->getUnserializedData());  
echo '</pre>';  
?>

Filename: upalette1.php

The resulting data structure looks like this:

Array   
(  
   [color] => Array  
       (  
           [0] => red  
           [1] => green  
           [2] => blue  
       )  
 
)

You see here the need for caution when working with XML_Serializer — the original PHP data structure was as follows:

$palette = array('red', 'green', 'blue');

If you want to make sure you get back exactly the data structure you started with when dealing with indexed arrays, you need to switch ‘typeHints’ on when serializing them to XML.

Moving onto the second example, the code is essentially the same:

   
// Set error reporting to ignore notices  
error_reporting(E_ALL ^ E_NOTICE);  
 
// Include XML_Unserializer  
require_once 'XML/Unserializer.php';  
 
// The XML document  
$doc = <<<EOD  
<?xml version="1.0" encoding="ISO-8859-1"?>  
<palette>  
 <red>45</red>  
 <green>240</green>  
 <blue>120</blue>  
</palette>  
EOD;  
 
// Instantiate the serializer  
$Unserializer = &new XML_Unserializer();  
 
// Serialize the data structure  
$status = $Unserializer->unserialize($doc);  
 
// Check whether serialization worked  
if (PEAR::isError($status)) {  
   die($status->getMessage());  
}  
 
// Display the PHP data structure  
echo '<pre>';  
print_r($Unserializer->getUnserializedData());  
echo '</pre>';  
?>

Filename: upalette2.php

The resulting PHP data structure is now:

Array   
(  
   [red] => 45  
   [green] => 240  
   [blue] => 120  
)

This time, it matches the original data structure, which also used an associative array (but note that the values of the array will be of type 'string', not of type 'integer'):

$palette = array(   
   'red' => 45,  
   'green' => 240,  
   'blue' => 120  
   );

The third example requires that I tell the unserializer to parse attributes:

<?php   
// Set error reporting to ignore notices  
error_reporting(E_ALL ^ E_NOTICE);  
 
// Include XML_Unserializer  
require_once 'XML/Unserializer.php';  
 
// The XML document  
$doc = <<<EOD  
<?xml version="1.0" encoding="ISO-8859-1"?>  
<palette blue="120" green="240" red="45" />  
EOD;  
 
// Array of options  
$unserializer_options = array (  
   'parseAttributes' => TRUE  
);  
 
// Instantiate the serializer  
$Unserializer = &new XML_Unserializer($unserializer_options);  
 
// Serialize the data structure  
$status = $Unserializer->unserialize($doc);  
 
// Check whether serialization worked  
if (PEAR::isError($status)) {  
   die($status->getMessage());  
}  
 
// Display the PHP data structure  
echo '<pre>';  
print_r($Unserializer->getUnserializedData());  
echo '</pre>';  
?>

The resulting PHP data structure is again more or less what I started with, although integers are now strings:

Array   
(  
   [blue] => 120  
   [green] => 240  
   [red] => 45  
)

Finally, when I unserialize the document containing type hints, I need to make sure the class ColorInformation is defined. In the original example, ColorInformation class used the constructor to pass external variables into its fields. Like PHP's unserialize() function, XML_Unserializer doesn't provide a mechanism for calling class methods when unserializing objects; rather, it sets the properties of an object directly (so, be careful):

<?php   
// Set error reporting to ignore notices  
error_reporting(E_ALL ^ E_NOTICE);  
 
// Include XML_Unserializer  
require_once 'XML/Unserializer.php';  
 
// A class to store color information  
class ColorInformation {  
   var $hue;  
   var $value;  
   function ColorInformation($hue = NULL, $value = NULL) {  
       $this->hue = $hue;  
       $this->value = $value;  
   }  
}  
 
// The XML document  
$doc = <<<EOD  
<?xml version="1.0" encoding="ISO-8859-1"?>  
<palette _type="array">  
 <color _class="colorinformation"  
        _originalKey="0"  
        _type="object">  
   <hue _type="string">red</hue>  
   <value _type="integer">45</value>  
 </color>  
 <color _class="colorinformation"  
        _originalKey="1"  
        _type="object">  
   <hue _type="string">green</hue>  
 
   <value _type="integer">240</value>  
 </color>  
 <color _class="colorinformation"  
        _originalKey="2"  
        _type="object">  
   <hue _type="string">blue</hue>  
   <value _type="integer">120</value>  
 </color>  
</palette>  
EOD;  
 
// Instantiate the serializer  
$Unserializer = &new XML_Unserializer();  
 
// Serialize the data structure  
$status = $Unserializer->unserialize($doc);  
 
// Check whether serialization worked  
if (PEAR::isError($status)) {  
   die($status->getMessage());  
}  
 
// Display the PHP data structure  
echo '<pre>';  
print_r($Unserializer->getUnserializedData());  
echo '</pre>';  
?>

The resulting data structure is:

Array   
(  
   [0] => colorinformation Object  
       (  
           [hue] => red  
           [value] => 45  
       )  
 
   [1] => colorinformation Object  
       (  
           [hue] => green  
           [value] => 240  
       )  
 
   [2] => colorinformation Object  
       (  
           [hue] => blue  
           [value] => 120  
       )  
 
)

This matches the original data structure, ignoring the issue with the constructor:

$palette = array();   
$palette[] = &new ColorInformation('red', 45);  
$palette[] = &new ColorInformation('green', 240);  
$palette[] = &new ColorInformation('blue', 120);

Notes on Serializing Objects:

As with PHP's in-built serialize() and unserialize() functions, when PEAR::XML_Serializer transforms to XML and back, it will attempt to call the __sleep() and __wakeup() functions on those objects, if they have been defined. This gives you a chance to perform operations, such as connecting or disconnecting from a database, by defining them within these methods. See the PHP manual on __sleep() and __wakeup() for more details.

When objects are unserialized by XML_Unserializer, it first attempts to re-build them using the original class, but to do so your code must make sure the class is available. If it fails to find the class definition, assuming you're using PHP4, it will use PHP's built-in stdClass definition instead. Last time I looked, PHP5 had dropped support for stdClass, so it remains to be seen what will happen in such instances.

Currently XML_Serializer has no way to represent objects that contain references to each other. The TODO list that comes with the package indicates support is planned in the near future.

Managing Configuration Information

Now that you've had a dry view of what PEAR::XML_Serializer offers, and have seen some basic examples, it's time to do something useful with it.

Most PHP applications require some form of configuration to enable them to "understand" the environment in which they're being used, such as the domain name of the Web server, the administrator's email address, database connection settings and so on. There are a number of common approaches to handling this in PHP, from simply having a PHP script with a list of variables that need editing, to using the parse_ini_file() function (note that PHP can parse an ini file faster than it can include and parse the equivalent PHP script).

XML makes another choice, being relatively friendly to edit manually, relatively easy to parse and generate and allowing more complex data structures than an ini file. On the downside, retrieving configuration data from an XML file is liable to be slow, compared to alternatives (although some tricks with PHP code generation can help you get round this, but that's another story).

Performance issues aside, here's one approach using PEAR::XML_Serializer and a class that allows you to retrieve and modify configuration settings.

First, I define two classes: one in which to store configuration data, and a second to manage access to it:

<?php    
/**    
* The name of file used to store config data    
*/    
define ('CONFIG_FILE', 'config.xml');    
   
/**    
* Stores configuration data    
*/    
class Config {    
   /**    
    * Array of configuration options    
    * @var array    
    * @access private    
    */    
   var $options = array();    
   
   /**    
    * Returns the value of a configuration option, if found    
    * @param string name of option    
    * @return mixed value if found or void if not    
    * @access public    
    */    
   function get($name) {    
       if (isset($this->options[$name])) {    
           return $this->options[$name];    
       }    
   }    
   
   /**    
    * Sets a configuration option    
    * @param string name of option    
    * @param mixed value of option    
    * @return void    
    * @access public    
    */    
   function set($name, $value) {    
       $this->options[$name] = $value;    
   }    
}

The Config class acts as a simple store for values, allowing access via the get() and set() methods.

/**    
* Provides a gateway to the Config class, managing its serialization    
*/    
class ConfigManager {    
   /**    
    * Returns a singleton instance of Config    
    * @return Config    
    * @access public    
    * @static    
    */    
   function &instance() {    
       static $Config = NULL;    
       if (!$Config) {    
           $Config = ConfigManager::load();    
       }    
       return $Config;    
   }    
   
   /**    
    * Loads the Config instance from it's XML representation    
    * @return Config    
    * @access private    
    * @static    
    */    
   function load() {    
       error_reporting(E_ALL ^ E_NOTICE);    
       require_once 'XML/Unserializer.php';    
       $Unserializer = &new XML_Unserializer();    
       if (file_exists(CONFIG_FILE)) {    
           $status = $Unserializer->unserialize(CONFIG_FILE, TRUE);    
           if (PEAR::isError($status)) {    
               trigger_error ($status->getMessage(), E_USER_WARNING);    
           }    
           $Config = $Unserializer->getUnserializedData();    
       } else {    
           $Config = new Config();    
       }    
       return $Config;    
   }    
   
   /**    
    * Stores the Config instance, serializing it to an XML file    
    * @return boolean TRUE on succes    
    * @access public    
    * @static    
    */    
   function store() {    
       error_reporting(E_ALL ^ E_NOTICE);    
       require_once 'XML/Serializer.php';    
       $Config = &ConfigManager::instance();    
       $serializer_options = array (    
           'addDecl' => TRUE,    
           'encoding' => 'ISO-8859-1',    
           'indent' => '  ',    
           'typeHints' => TRUE,    
       );    
       $Serializer = &new XML_Serializer($serializer_options);    
       $status = $Serializer->serialize($Config);    
       $success = FALSE;    
       if (PEAR::isError($status)) {    
           trigger_error($status->getMessage(), E_USER_WARNING);    
       }    
       $data = $Serializer->getSerializedData();    
       if (!$fp = fopen(CONFIG_FILE, 'wb')) {    
           trigger_error('Cannot open ' . CONFIG_FILE);    
       } else {    
           if (!fwrite($fp, $data, strlen($data))){    
               trigger_error(    
                   'Cannot write to ' . CONFIG_FILE, E_USER_WARNING    
                   );    
           } else {    
               $success = TRUE;    
           }    
           fclose($fp);    
       }    
       return $success;    
   }    
}    
?>

Filename: configmanager.php

The ConfigManager class is a bit more complex. I'll explain the key points here, but if you have any specific questions, feel free to drop them into the discussion at the end of this article.

The static instance() method uses the PHP4 trick for creating Singleton instances on an object. Whether you're aware of the Singleton design pattern or not, what the instance() method allows me to do is fetch the same instance of Config from anywhere in my code, simply by calling ConfigManager::instance(), making sure that any changes that happen the Config object are available from wherever it's used.

The load() and store() methods handle serializing and unserializing the Config object to XML. The load() method is intended only to be called by the instance() method while the store() method should be called at the end of my application's execution, if any external changes were made to the Config object. Note that I'm using Lazy Includes inside these methods to keep the amount of parsing the PHP engine needs to do to a minimum. There may be instances where load() is called but not store(), when the code using it only needs to retrieve configuration values, not modify them. In these instances, including the XML_Serializer class on every request wastes overhead.

Now, using PEAR::HTML_QuickForm, I can build a form for editing the configuration file. This time, I'm going to skip explaining HTML_QuickForm (you can find further examples in the package and tutorials in The PHP Anthology). Make sure you have it installed by typing:

$ pear install HTML_Quickform

The HTML_Quickform Version used here was 3.2.2.

The form code:

<?php    
require_once 'configmanager.php';    
require_once 'HTML/QuickForm.php';    
   
// Fetch the singleton instance of Config    
$Config = &ConfigManager::instance();    
   
// Build a form with PEAR::HTML_QuickForm    
$Form = new HTML_QuickForm('labels_example', 'post');    
$Form->addElement('text', 'domain', 'Domain');    
$Form->addRule('domain', 'Please enter a domain name', 'required',    
   NULL, 'client');    
$Form->addRule('domain', 'Please enter a valid domain name',    
   'regex', '/^(www.)?.+.(com|net|org)$/', 'client');    
$Form->addElement('text', 'email', 'Email');    
$Form->addRule('email', 'Please enter an email address', 'required',    
   NULL, 'client');    
$Form->addRule('email', 'Please enter a valid email address',    
   'email', NULL, 'client');    
$Form->addElement('text', 'docroot', 'Document Root');    
$Form->addRule('docroot', 'Please enter the document root',    
   'required', NULL, 'client');    
$Form->addRule('docroot', 'Please enter a valid document root',    
   'callback', 'is_dir');    
$Form->addElement('text', 'tmp', 'Tmp Dir');    
$Form->addRule('tmp', 'Please enter the tmp dir', 'required',    
   NULL, 'client');    
$Form->addRule('tmp', 'Please enter a valid tmp dir', 'callback',    
   'is_dir');    
$Form->addElement('text', 'db_host', 'DB Host');    
$Form->addRule('db_host', 'Please enter a value for DB Host', 'required',    
   NULL, 'client');    
$Form->addRule('db_host', 'Please enter a valid value for DB Host',    
   'regex', '/^[a-zA-Z0-9.]+$/', 'client');    
$Form->addElement('text', 'db_user', 'DB User');    
$Form->addRule('db_user', 'Please enter a value for DB User', 'required',    
   NULL, 'client');    
$Form->addRule('db_user', 'Please enter a valid value for DB User',    
   'regex', '/^[a-zA-Z0-9]+$/', 'client');    
$Form->addElement('text', 'db_pass', 'DB Password');    
$Form->addRule('db_pass', 'Please enter a value for DB Password', 'required',    
   NULL, 'client');    
$Form->addRule('db_pass', 'Please enter a valid value for DB Password',    
   'regex', '/^[a-zA-Z0-9]+$/', 'client');    
$Form->addElement('text', 'db_name', 'DB Name');    
$Form->addRule('db_name', 'Please enter a value for DB Name', 'required',    
   NULL, 'client');    
$Form->addRule('db_name', 'Please enter a valid value for DB Name', 'regex',    
   '/^[a-zA-Z0-9]+$/', 'client');    
$Form->addElement('submit', null, 'Update');    
   
// Initialize $db array as needed    
$db = $Config->get('db');    
if (!is_array($db)) $db = array();    
if (!isset($db['db_host'])) $db['db_host'] = '';    
if (!isset($db['db_user'])) $db['db_user'] = '';    
if (!isset($db['db_pass'])) $db['db_pass'] = '';    
if (!isset($db['db_name'])) $db['db_name'] = '';    
   
// Set initial form values from Config    
$Form->setDefaults(array(    
   'domain' => $Config->get('domain'),    
   'email' => $Config->get('email'),    
   'docroot' => $Config->get('docroot'),    
   'tmp' => $Config->get('tmp'),    
   'db_host' => $db['db_host'],    
   'db_user' => $db['db_user'],    
   'db_pass' => $db['db_pass'],    
   'db_name' => $db['db_name'],    
   ));    
   
// If the form is valid update the configuration file    
if ($Form->validate()) {    
   $result = $Form->getSubmitValues();    
   $Config->set('domain',$result['domain']);    
   $Config->set('email',$result['email']);    
   $Config->set('docroot',$result['docroot']);    
   $Config->set('tmp',$result['tmp']);    
   $db['db_host'] = $result['db_host'];    
   $db['db_user'] = $result['db_user'];    
   $db['db_pass'] = $result['db_pass'];    
   $db['db_name'] = $result['db_name'];    
   $Config->set('db', $db);    
   if (ConfigManager::store()) {    
       echo "Config updated successfully";    
   } else {    
       echo "Error updating configuration";    
   }    
} else {    
   echo '<h1>Edit ' . CONFIG_FILE . '</h1>';    
   $Form->display();    
}    
?>

Filename: configedit.php

At the start, I fetch the instance of Config from ConfigManager and use it to populate the default form values. I need to initialise the $db array in case this is the first time the config.xml file has been edited (i.e. it doesn't yet exist) and Config contains empty values.

Once the form is submitted as successfully validated, I place the values back in the Config instance, then call ConfigManager::store() to update the config.xml document with the latest values.

Here's how the form looks in a browser:

1336_form

The config.xml file stored looks like this:

<?xml version="1.0" encoding="ISO-8859-1"?>    
<config _class="config" _type="object">    
 <options _type="array">    
   <domain _type="string">www.sitepoint.com</domain>    
   <email _type="string">info@sitepoint.com</email>    
   <docroot _type="string">/www</docroot>    
   <tmp _type="string">/tmp</tmp>    
   <db _type="array">    
     <db_host _type="string">db.sitepoint.com</db_host>    
     <db_user _type="string">phpclient</db_user>    
     <db_pass _type="string">secret</db_pass>    
     <db_name _type="string">sitepointdb</db_name>    
   </db>    
 </options>    
</config>

Filename: config.xml

The use of typehints makes it a little unfriendly to the human eye, but it's still possible to edit this file manually, should it be necessary.

Now, using another script, I can access the values in config.xml:

<?php    
require_once 'configmanager.php';    
   
// Fetch the singleton instance of Config    
$Config = &ConfigManager::instance();    
?>    
<h1><?php echo CONFIG_FILE; ?></h1>    
<table>    
   <tr>    
       <td>Domain:</td><td><?php echo $Config->get('domain'); ?></td>    
   </tr>    
   <tr>    
       <td>Email:</td><td><?php echo $Config->get('email'); ?></td>    
   </tr>    
   <tr>    
       <td>Docroot:</td><td><?php echo $Config->get('docroot'); ?></td>    
   </tr>    
   <tr>    
       <td>Tmp Dir:</td><td><?php echo $Config->get('tmp'); ?></td>    
   </tr>    
   <?php $db = $Config->get('db'); ?>    
   <tr>    
       <td>DB Host:</td><td><?php echo $db['db_host']; ?></td>    
   </tr>    
   <tr>    
       <td>DB User:</td><td><?php echo $db['db_user']; ?></td>    
   </tr>    
   <tr>    
       <td>DB Pass:</td><td><?php echo $db['db_pass']; ?></td>    
   </tr>    
   <tr>    
       <td>DB Name:</td><td><?php echo $db['db_name']; ?></td>    
   </tr>    
</table>

Filename: configview.php

Here, I'm simply displaying them in a table, so you can see how it works. Because I can call ConfigManager::instance() from anywhere in my code, and receive an up-to-date reference to the Config object, it's easy to retrieve the values stored ico
n it when I need to configure the behaviour of my application. Also, because I'm working with a Singleton instance of Config, the overhead of unserializing the underlying xml document only needs to be incurred once, the first time I fetch an instance of Config.

Using PEAR::XML_Serializer in this example helps me avoid getting involved with the nitty gritty of XML, allowing me to focus my efforts on code that builds on it and has direct value to my application.

Web Services with PEAR::XML_Serializer

Another area where PEAR::XML_Serializer can prove valuable is in system to system or application to application data exchange. Packages like PEAR::SOAP and PEAR::XML_RPC provide implementations of the respective Web services protocols, but SOAP and XML-RPC are not the only ways to move data from A to B.

Amazon, for example, provides what is commonly referred to as a REST-ful interface to their Website (for a short overview of REST Web services see Building Web Services the REST Way). What this means is that you can access the data about the products Amazon sells using nothing more that a URL. The result you get back from a URL like this is an XML document containing the data you'd normally find wrapped up in HTML on a page like this. By exposing the data as XML, Amazon makes it very easy to parse from a remote Website and display using your own HTML. Full details can be found at amazon.com/webservices (you'll need to sign up as an associate).

Where PEAR::XML_Serializer is concerned, it's very easy to parse the XML Amazon provides and turn it into a Web page:

<?php     
// Include PEAR::HTTP_Request    
require_once 'HTTP/Request.php';    
   
// Include PEAR::XML_Unserializer    
require_once 'XML/Unserializer.php';    
   
// Your Amazon associate ID    
$assoc_id = 'sitepoint';    
   
// Allow the Amazon book search keyword to be entered via the URL    
if (!isset($_GET['keyword']) ||    
       !preg_match('/^[a-zA-Z]+$/', $_GET['keyword'])) {    
   $_GET['keyword'] = 'php';    
}    
   
// Build the URL to access the Amazon XML    
$amazon_url = 'http://rcm.amazon.com/e/cm?t=' . $assoc_id .    
             '&l=st1&search=' . $_GET['keyword'] .    
             '&mode=books&p=102&o=1&f=xml';    
   
// Create the HTTP_Request object, specifying the URL    
$Request = &new HTTP_Request($amazon_url);    
   
// Set proxy server as necessary    
// $Request->setProxy('proxy.myisp.com', '8080', 'harryf', 'secret');    
   
// Send the request for the feed to the remote server    
$status = $Request->sendRequest();    
   
// Check for errors    
if (PEAR::isError($status)) {    
  die("Connection problem: " . $status->toString());    
}    
   
// Check we got an HTTP 200 status code (if not there's a problem)    
if ($Request->getResponseCode() != '200') {    
  die("Request failed: " . $Request->getResponseCode());    
}    
   
// Get the XML from Amazon    
$amazon_xml = $Request->getResponseBody();      
   
// Create an instance of XML_Unserializer    
$Unserializer = new XML_Unserializer();    
   
// Unserialize the XML    
$status = $Unserializer->unserialize($amazon_xml);    
   
// Check for errors    
if (PEAR::isError($status)) {    
   die($status->getMessage());    
}    
   
// Get the PHP data structure from the XML    
$amazon_data = $Unserializer->getUnserializedData();    
   
?>    
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"    
       "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">    
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">    
<head>    
   <title>Amazon Search for <?php echo $_GET['keyword']; ?></title>    
</head>    
<body>    
   <h1>Amazon Search for <?php echo $_GET['keyword']; ?></h1>    
   <p>    
       <a href="?keyword=Linux">Search for Linux</a> |    
       <a href="?keyword=Apache">Search for Apache</a> |    
       <a href="?keyword=MySQL">Search for MySQL</a> |    
       <a href="?keyword=PHP">Search for PHP</a>    
   </p>    
   <table>    
       <tr>    
           <td>    
       <?php    
       foreach ($amazon_data['product'] as $product) {    
       ?>    
       <table width="600">    
           <tr>    
               <th><?php echo nl2br(wordwrap($product['title'], 40)); ?></th>    
               <td rowspan="2" align="right">    
                   <a href="<?php echo $product['tagged_url']; ?>">    
                       <img src="<?php echo $product['small_image']; ?>"    
                            border="0" />    
                   </a>    
               </td>    
           </tr>    
           <tr>    
               <td>    
                   Author: <?php echo $product['author']; ?><br />    
                   ISBN: <?php echo $product['asin']; ?><br />    
                   Price: <?php echo $product['our_price']; ?><br />    
               </td>    
           </tr>    
       </table>    
       <?php    
       }    
       ?>    
           </td>    
       </tr>    
   </table>    
</body>    
</html>

Filename: amazon.php

The code here is essentially the same as you've seen before, at the end of Getting Started with PEAR, for parsing an RSS feed. I've used PEAR::HTTP_Request (version 1.2) as an HTTP client, to give me more detailed error reporting. The rest is simply unserializing Amazon's data.

Here's what the (somewhat crude) HTML looks like in a browser:

1336_amazon

Of course, it doesn't stop with parsing someone else's XML. How about doing the same on your own site? Here's a simple example:

<?php     
// An array simulating a database result set    
$products = array(    
   array('code' => '000325', 'item' => 'Hamster', 'price' => 13.99),    
   array('code' => '005523', 'item' => 'Parrot', 'price' => 76.99),    
   array('code' => '000153', 'item' => 'Snake', 'price' => 49.99),    
);    
   
// If ?mime=xml is in the URL, display XML    
if (isset($_GET['mime']) && $_GET['mime'] == 'xml') {    
   
   error_reporting(E_ALL ^ E_NOTICE);    
   require_once 'XML/Serializer.php'; // Lazy include    
   
   $serializer_options = array (    
       'addDecl' => TRUE,    
       'encoding' => 'ISO-8859-1',    
       'indent' => '  ',    
       'rootName' => 'products',    
       'defaultTagName' => 'product',    
   );    
   
   $Serializer = &new XML_Serializer($serializer_options);    
   
   $status = $Serializer->serialize($products);    
   
   if (PEAR::isError($status)) {    
       die($status->getMessage());    
   }    
   
   // Display the XML    
   header('Content-type: text/xml');    
   echo $Serializer->getSerializedData();    
   
} else {    
   // Otherwise the HTML equivalent    
?>    
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"    
       "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">    
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">    
<head>    
   <title>Product Catalog</title>    
</head>    
<body>    
   <h1>Product Catalog</h1>    
   <table>    
       <tr>    
           <th>Product Code</th>    
           <th>Item</th>    
           <th>Price</th>    
       </tr>    
       <?php    
       foreach ($products as $product) {    
       ?>    
       <tr>    
           <td><?php echo $product['code']; ?></td>    
           <td><?php echo $product['item']; ?></td>    
           <td><?php echo $product['price']; ?></td>    
       </tr>    
       <?php    
       }    
       ?>    
   </table>    
</body>    
</html>    
<?php    
}    
?>

Filename: products.php

If someone adds ?mime=xml to the URL used to view this script, instead of receiving the page as HTML, they get the following XML:

<?xml version="1.0" encoding="ISO-8859-1"?>     
<products>    
 <product>    
   <code>000325</code>    
   <item>Hamster</item>    
   <price>13.99</price>    
 </product>    
 <product>    
   <code>005523</code>    
   <item>Parrot</item>    
   <price>76.99</price>    
 </product>    
 <product>    
   <code>000153</code>    
   <item>Snake</item>    
   <price>49.99</price>    
 </product>    
</products>

This makes it very easy to display your data on a remote affiliate Website, should you so desire. So long as you keep the code that deals with accessing and manipulating data separate from the code that deals with presenting it to an end user, it should be no problem to provide an "alternate XML view" using XML_Serializer.

Throw into the mix a library like JOX, which provides a similar XML serializer for Java Beans, and you've got a convenient mechanism for getting PHP and Java talking.

Wrap Up

As you've seen in this article, PEAR::XML_Serializer provides a very handy tool for working with XML. You've seen how to use PEAR::XML_Serializer, and now have some idea of the types of problems to which it's suited. There are still a few minor glitches to be ironed out (the version 0.9.1 used here is beta status) but, in general, PEAR::XML_Serializer performs reliably and I've yet to find any show-stopping bugs.

Most importantly, PEAR::XML_Serializer provides an approach to parsing XML that saves you from messing with XML's details. As described in A Survey of APIs and Techniques for Processing XML, PEAR::XML_Serializer provides an "Object to XML Mapping API". Although there are limitations using to this approach, for solving the types of problems you've seen in this article, an Object to XML Mapping API makes life a lot easier.

With PHP5 packing vastly improved XML support, with support for XML Schema and Relax NG, new doors may open to PEAR::XML_Serializer for handling what it currently achieves with "typeHints". And with that come further possibilities of interop with Java (via JAXB) and .NET (via it's XmlSerializer).

Win an Annual Membership to Learnable,

SitePoint's Learning Platform

No Reader comments

Comments on this post are closed.