Introduction to XML

Tweet

XML: It’s just text!

Welcome to the kickoff of SitePoint’s XML week! This week a few of us will be doing our best to dissolve the hype surrounding XML, showing you how it works and how you can use it to enhance your Website in ways you may never have imagined.

Ask Web builder: "What is XML?" and you’ll get answers which range from "It’s the greatest thing since spliced silicon!" to "Who cares?"; from passion to paranoia. Frequently the answer you’ll get is simply "I don’t understand XML — but I wish I did".

As with all technologies, finding answers to simple questions is often difficult and perhaps this is the case with no technology more than it is XML. So in this article I’ll be putting XML under a microscope, showing you what it is and how it works. What’s more, I’ll dare to make the bold claim that by the time you’ve finished reading this article you’ll be saying "Is that all it is? That’s so easy!"

So here’s what’s in store:

  • The Who, What and Why of XML: It’s just text — honest!
  • Rolling your Own: The rules for making XML
  • Parsing in the Night: When XML comes alive
  • XML in Action: Things to do with XML

Warning: I use PHP to illustrate some of the examples here, but if you don’t know PHP, you should be able to skip these examples, while still grasping the overall concepts. Ok, let’s get started.

The Who, What and Why of XML?

Q. Who needs to know about XML?

A. Anyone involved in building the Web or working in the IT business!

A bold claim perhaps, but XML is a technology that can be put to good use by everyone from Web designers to programmers. XML is more or less a mature technology these days. It has already begun creeping into all our lives, and over the next few years, it’s highly likely to become the norm for countless tasks — from rendering a single Web page, to exchanging business data worldwide. The big software vendors like Sun, Microsoft, IBM and Hewlett Packard continually add more and more XML-related functionality to their products, and in general, life online is migrating to XML.

I should make it clear that when I said "XML" just now, I was actually talking about a whole range of "add-on" technologies that build on the basic XML standard. For Web designers, for example, that means XHTML and XSLT, while developers should be concerned with technologies like SOAP, XML Schema and many more. We’ll be sticking with basic XML for the most of this article.

In short, if you’ve read this far, you probably need to know about XML. The good news is it couldn’t be simpler…

What is XML?

Ever created a file with Notepad? If so, you’ve already ready worked with the raw material of XML: text! That’s right; it’s nothing but good old ASCII.

XML is simply a set of rules for laying out text in order to make that text easier to "navigate". If you’ve ever edited an HTML file with a text editor, you practically know XML already.

So, let’s first ask "What is HTML?" One way you might describe HTML is a set of rules for laying out text so that it’s easy for human beings to read. If I put a word in bold, your eye is drawn to it because it stands out against the rest of the text. HTML allows us to exchange information between people in a way that’s easy for them to read. Where would we be without formatting? The Internet would be one giant README file!

Well XML is equivalent to HTML designed primarily for computers to read. Computers, not being as smart a human beings, need to be given more detail to help them find their way around a piece of information. XML is there to tell a computer not just how a document is formatted but also exactly what information that document contains.

Why XML?

That’s a very general description, but a far easier way is to understand XML is in terms of the problem it is there to solve…

Say we have a text file that contains some information about customers to our Website. It’s come from a database and every "row" of the file contains four pieces of information (let’s call them "elements") – the person’s first name, their last name, their email and the city they live in. For example;

Joe Bloggs jbloggs@yahoo.com Washington 

Mary Woods mwoods@hotmail.com London

We have a computer program with which we want to read this file, and extract the customer data, so we can email them with our latest special offer (spam them, in other words).

Looking at the above file, each "element" is separated from the next with a space character. Also, each row in the list is separated from the next by a new line character (each row starts on a new line). Using those characters, we can write a computer program to break the text file into smaller pieces we can use.

Using PHP by way of example, here’s how we could do it (bear with me if you don’t know PHP):

/* We begin with our customer list stored 
  in a variable called $customer_list */

/* Break up the customer list using PHP's  
  explode() function and the newline character
  at the end of each row */

$customers = explode ( "n" , $customers );

/* Now $customers is an array variable
  (basically a list of variables) so
  we can loop through it like so; */

foreach ( $customers as $customer ) {

   /* Explode each "row" stored in $customer using
      the space character to create another array
      called $element */

   $element = explode ( " ", $customer );

   /* Send an email using $element */

   mail ( $element[2],"Special Offer",
       "Hi $element[0] $element[1],nHow's it going in $element[3]?" );
}

The above code would send out two emails. The first goes to jbloggs@yahoo.com containing the message:

"Hi Joe Bloggs,
How's it going in Washington?"

The second email goes to mwoods@hotmail.com and contains:

"Hi Mary Woods,
How's it going in London?"

So, using some basic PHP formatting functions, we were able to break up the text file into pieces with which we could work, using the space characters and newlines to find the start and end points of the "chunks" of data we want.

Now that’s all fine until our customers list gets some new records (look carefully at rows three and four):

Joe Bloggs jbloggs@yahoo.com Washington  
Mary Woods mwoods@hotmail.com London  
Jean Du Vin jduvin@wanadoo.fr Paris  
Mike Macey mmacey@nyonline.com New York

First we have "Jean Du Vin". Now because we’re smart people, we can guess that "Du Vin" is probably the last name of this person, but how’s our computer program going to know that? It’s been told that after the first space it will find the persons last name then after the second space it should find the email address. But with this name it’s going to decide that the last name is "Du", the email address is "Vin" and the city is "jduvin@wanadoo.fr".

And we’ve got another problem with the city "New York". It also contains a space, so our program is probably going to decided the the city is called just "New" rather than "New York".

Now we could update our program so that it understands "Jean Du Vin" and "New York" as special cases. But imagine we have a list of 10,000 customers. How many special cases are we going to have to deal with? Instantly we have a nightmare on our hands.

So we need some kind of special character to separate the "elements", right? How about a comma (as you might find in a CSV file)?

Joe,Bloggs,jbloggs@yahoo.com,Washington  
Mary,Woods,mwoods@hotmail.com,London  
Jean,Du Vin,jduvin@wanadoo.fr,Paris  
Mike,Macey,mmacey@nyonline.com,New York

Now we can look for the commas instead of spaces, and we’ve solved the problem! Well… we have until someone enters "Paris, Texas" to distinguish it from just "Paris". Although the commas are a step forward, there may also be special cases we need to be prepared for. Also we rely on the "elements" to appear in a row in the right order – it would be nice, for instance, if we could identify an email as an email where ever it appears in the list. And what if elements are missing in some rows, or we have extra elements we weren’t expecting?

Enter: XML

How about we lay out our list like this:

<?xml version="1.0"?>  
<customer_list>  
 <customer>  
   <first_name>Joe</first_name>  
   <email>jbloggs@yahoo.com</email>  
   <last_name>Bloggs</last_name>  
   <city>Washington</city>  
 </customer>  
 <customer>  
   <last_name>Woods</last_name>  
   <first_name>Mary</first_name>  
   <city>London</city>  
   <email>mwoods@hotmail.com</email>  
 </customer>  
 <customer>  
   <last_name>Du Vin</last_name>  
   <first_name>Jean</first_name>  
   <email>jduvin@wanadoo.fr</email>  
   <city>Paris</city>  
   <country>France</country>  
 </customer>    
 <customer>  
   <city>New York</city>  
   <last_name>Macey</last_name>  
   <email>mmacey@nyonline.com</email>  
 </customer>      
 
</customer_list>

Now our problem really is solved! Every "element" of data is neatly wrapped up in "tags" which make it clear exactly where the data begins and ends.

We also have a description of what each piece of data actually is (such as an "email" or a "city") and, as such, the data can appear in any order without worrying our program.

Another gain is that we also have a data "hierarchy" – the "customer_list" tag contains elements called "customers" which, in turn, contain the "first_name", "last_name", "email" and "city" elements.

I’ve also slipped in a couple more surprises: removing an element from the last customer and for the one above, adding a new element <country /> – and with XML, this is no problem. That’s what the X stands for: eXtensible (imagine what would have happened if I’d removed or added an element to the comma-seperated file). So, extracting the data from this file is now is simple job for any system.

And that grasps the essence of XML: it’s a technology for transferring data between systems in a platform-independent manner.

For example, a Windows workstation can fetch XML data from a mainframe, do something to it, and then pass it on to a Linux server, without any of them batting an eyelid.

The same goes for the exchange of data between applications on the same system – with MySQL for example, the mysqldump utility can be used to deliver data in XML form to a file (e.g. mysqldump -X mydatabase), which can then be delivered (perhaps with aid of PHP) to a Web browser for viewing.

Because all modern operating systems support the ASCII text standard, XML makes the perfect choice for data exchange from anywhere to anywhere.

All the applications of XML that you may have come across (XSLT, XML-RPC, SOAP, XML Schema) are just mechanisms that are used to enhance the ability of XML to exchange data in some way.

One final thing to be aware of when you think about XML is that, although XML is at heart just a standard — a set of rules for formatting ASCII text — bit by bit it’s reaching the point where it could be regarded as a programming language, when you take into account "add-on" technologies like XSLT.

Rolling Your Own

So, now that you know the what and why of XML, let’s get down to creating an XML document.

I mentioned above that you probably, more or less, know XML already, having worked with HTML. Here’s why:

<html>   
 <head>  
   <title> Almost an XML Page </title>  
 </head>  
 <body>  
   <table>  
     <tr>  
       <td>This is almost XML!</td>  
     </tr>  
   </table>  
 </body>  
</html>

Compare that with what we saw above. Notice how we’ve got tags nested within tags? To turn the above HTML into XML all we need is to add the following at the top (note that this applies only to the above document — most HTML is not XML-compliant):

<?xml version="1.0"?>

The reason why HTML and XML are so similar is because they were derived from an older standard SGML, that was conceived back in the 1960’s. Comparing the two, XML is more pedantic than HTML. For example, you’re probably used to using the following in HTML:

<img src="myimage.png">

In XML, all tags must be closed, so the above would have to be re-written as it is here (notice additional forward slash at the end):

<img src="myimage.png" />

There’s a few features of XML that you need to be aware of:

Attributes vs. Elements

XML has two mechanisms for placing data (referred to as <i>character data</i>) in tags. Elements are placed between the tags:

<tag>This is an element</tag>

Attributes are placed in the opening tag (much like <a href="http://www.sitepoint.com" />):

<tag myattribute="This is an attribute">This is an element</tag>

Whether you use elements or attributes is up to you (it’s a subject almost as hotly argued as PHP vs ASP!). With a little experience, you’ll know the answer intuitively.

Commenting XML

XML comments are the same as HTML comments:

<!-- this is a comment and it can contain <tags /> which    
will be ignored -->

Entities

Entities are a way to replace character data with something else. There are effectively two types of entity – those you have to have, and those that you define yourself. Make sure you know which entities are required by looking at the rules below.

You may have come across entities in HTML, for example &copy;, which tells the browser to display a ‘©’. Defining your own entities requires what’s known as a Document Type Definition (DTD), discussed briefly at the end of this article.

In general, apart from the entities you have to use to create well-formed XML, you shouldn’t need to worry about them too much.

Processing Instructions

Abbreviated to ‘PI’, processing instructions represent a way to insert special messages that will be recognised by the application that will read the document — much like placing JavaScript within an HTML document. The <? and ?> tags are used to mark the start and end of a PI. An example using PHP might be:

<?xml version="1.0"?>   
<myscript>  
 <authorised><? echo ( 'Welcome back!' ); ?></authorised>  
 <unauthorised><? echo ( 'Please log in' ); ?></unauthorised>  
</myscript>

Note that the following is also acceptable:

<?xml version="1.0"?>   
<myscript>  
 <authorised><?php echo ( 'Welcome back!' ); ?></authorised>  
 <unauthorised><?php echo ( 'Please log in' ); ?></unauthorised>  
</myscript>

…understand now why the PHP group chose to mark up PHP that way?

CDATA

CDATA blocks are a way to tell any application reading your XML document to treat the contents as normal characters (i.e. that any XML tags it should happen to find within the CDATA block should be ignored). CDATA blocks are marked up using <![CDATA[ and ]]>. For example:

<?xml version="1.0"?>   
<root>  
 <tag>  
 <![CDATA[  
   This <xml_tag /> will be treated as normal text.  
 ]]>  
 </tag>  
</root>  
</root>

The Rules of XML

We need to look next at the rules that govern XML documents. The rules can get a little tedious so if you’re in a hurry, just have a quick glance through and refer back later. You’ll find that, once you get into writing your own XML documents, most of these rules will be pretty obvious.

The XML standard itself is available at http://www.w3.org/TR/2000/REC-xml-20001006. To save you a long read, the key rules are explained below. Note that if an XML document obeys these rules, it is said to be well formed (the word "valid" has another meaning in XML, which we’ll look at later):

These are the most important rules any XML must obey.

1. XML Version Required

All XML documents must begin with a statement that describes the version of the XML standard being used:

<?xml version="1.0"?>

The above is in fact a processing instruction.

2. Close your Tags!

Every XML tag must be properly closed. HTML is more relaxed here, allowing you to use tags like <img> and <br> without closing them. In XML these should be <br></br> or just <br /> if the tag contains no data.

3. XML Tags Must be Nested in the Correct Order

In HTML, a browser will allow you to have <i> <b> Hello World! </i> </b>. In XML this would have to be either <i> <b> Hello World! </b> </i> or <b> <i> Hello World! </i> </b>.

4. XML is Sensitive to UPPERCASE/lowercase

In XML <mytag /> is not the same as <MYTAG />! In HTML you can get away with this — a browser will (generally) treat <BODY></body> as being the same thing.

5. And I Quote…

XML attributes must have quotes around them. In HTML you can get away with <a href=mypage.html>It's a Link!</a>. In XML that has to be <a href="mypage.html">It's a Link!</a>.

6. An XML Document Must have at Least One Element

At least one element, known as the the root element must exist for an XML document to be well formed. This tag doesn’t have to contain anything, though, so the example below is acceptable:

<?xml version="1.0"?>    
<root />

7. Naming your Tags

The way you name your XML tags is governed by the following rules;

  • tag names can contain letters, numbers, and other characters (e.g. <mytag3></mytag3> is fine)
  • tag names cannot contain spaces ( e.g. <my tag></my tag> is wrong)
  • tag names cannot start with the letters xml (including UPPER or mIXeDcase)
  • tag names cannot start with a number or punctuation character (e.g. <3mytag></3mytag and <.mytag></.mytag> are both wrong).

8. Special Characters

Within the data you place in a tag or attribute, certain characters must be replaced with entities to prevent them from being mixed up with XML tags and syntax. These characters are:

Character : Entity : Example

" : &quot; : <tag entity="Here is a quote &quot;" />
' : &apos; : <tag entity="Here is an apostrophy &apos;" />
< : &lt; : <tag>1 &lt; 2</tag>
> : &gt; : <tag>2 &gt; 1</xml_tag>
& : &amp; : <tag>Kramer &amp; Kramer</tag>

In PHP, the function htmlspecialchars() will achieve this.

9. New Lines and White Space

For new lines in XML, the XML standard supports carriage returns and linefeeds ( i.e. rn, r and n , as in most programming languages, are acceptable). Having said that, XML processors expected to ‘normalize’ these to n during processing.

Whitespace in XML is regarded as space characters, new lines (above), and tab characters. If a document has no DTD (see below), all whitespace is must be preserved. If a DTD is provided with the XML document, if any element contains nothing but white space or other elements, the whitespace can be removed in processing the document – it’s down to the DTD (or XML Schema) to specify which elements should have their whitespace preserved.

In most cases you shouldn’t need to worry about this, but in particular where XSLT is concerned, to generate output for humans to read, you may need to be careful. You can find out more online in What’s the diff? and Controlling Whitespace.

Your First XML Document

So, now you know the boring stuff, you’re equipped to write your own XML documents. And the good news is, as with HTML, all you need is a text editor to create it! For viewing the XML in a nice format (including checking that it’s well formed), Internet Explorer is a good choice.

Save your file with the extension .xml and you should be able to open it with Internet Explorer to view the document as a collapsible tree. Here’s an XML document that demonstrates most of what we’ve seen:

<?xml version="1.0" ?>     
<!-- My first XML document -->    
<articles>    
 <article author="harryf" date="13 Oct 2002">    
   <title>XML is so easy!</title>    
   <body>XML really is nothing complicated</body>    
 </article>    
 <article author="harryf" date="13 Oct 2002">    
   <title>A Program Instruction</title>    
   <body>Here's a PI for PHP: <?php phpinfo(); ?></body>    
 </article>    
 <article author="harryf" date="13 Oct 2002">    
   <title>An Entity</title>    
   <body>Mathematics: x &lt; y &gt; z</body>    
 </article>    
</articles>

Told you it was easy!

If you really get into the job of editing XML, there’s a few "professional tools" you may want to consider (most of which are either Open Source or have evaluation versions). These become particularly valuable when you start to work with complicated XML documents or some of the "advanced" XML technologies like XSLT and XML Schema.

Two well worth a look are:

  • Cooktop (http://www.xmlcooktop.com/) – An excellent Open Source editor providing support for plenty of additional XML technologies like XSLT and Web services.
  • XML Spy (http://www.xmlspy.com) – A commercial editor with support for most of the important XML technologies (XML Schema, XSLT, web services et al.), with a nice display to help visualise XML documents.

More can be found discussed here at the SitePointForums.

In generating XML from your applications (be they PHP, Java, C++, Python, C# etc.), be aware that sometimes (especially for simple documents) it’s best simply to "hard code" XML into your code which you can echo() (or print(), system.out.printLn() etc.) directly to output. For more complicated tasks, you may want to consider a DOM parser, which we’ll look at next…

Parsing in the Night

HTML is only pleasant to look when it comes into contact with a Web browser – otherwise it’s just a boring ASCII text file. The same principle applies to XML but the "target" application for XML doesn’t have to be a Web browser. XML only "comes to life" when some application "reads" it.

When an application reads an XML document, it’s described as having <i>parsed</i> the document. That means it searched through the document, found all the character data placed within the XML tags, and has them available in some form that’s ready for us to use.

The subject parsing is one that causes a lot of confusion to those getting started with XML. You may come across people who talk about things like SAX and DOM and wonder how musical instruments and cleaning fluids relate to XML. Again, the thing to remember is XML is at heart very simple.

If you’ve had any experience with programming, ask yourself "How do I extract the data from this piece of XML?":

<tag>My element</tag>

In PHP you might use a regular expression like:

<?php      
$xml="<tag>My element</tag>";    
preg_match ( "/<tag>(.*)</tag>/",$xml,$output );    
echo ($output[1]);    
?>

This is fine for a single tag. But what if we throw in some more elements, plus some attributes, PI’s, comments and character data? Do you really want to write a program to be able to parse the XML document we wrote earlier?

<?xml version="1.0" ?>     
<!-- My first XML document -->    
<articles>    
 <article author="harryf" date="13 Oct 2002">    
   <title>XML is so easy!</title>    
   <body>XML really is nothing complicated</body>    
 </article>    
 <article author="harryf" date="13 Oct 2002">    
   <title>A Program Instruction</title>    
   <body>Here's a PI for PHP: <?php phpinfo(); ?></body>    
 </article>    
 <article author="harryf" date="13 Oct 2002">    
   <title>An Entity</title>    
   <body>Mathematics: x &lt; y &gt; z</body>    
 </article>    
</articles>

It’s perfectly possible to do so if you’re blessed with infinite time, but thankfully most programming languages and XML tools come with their own parsers to do this for you.

SAX and DOM

SAX and DOM are effectively two strategies for parsing XML (usually referred to as APIs – Application Program Interfaces). So you know, SAX stands for "Simple API for XML", while DOM is for "Document Object Model".

The SAX approach says "Give me a list of XML tags with information on what I should do with them. I’ll read through the XML document from start to finish and every time I find a tag that was in your list, I’ll do what you told me to." In other words, the SAX approach is event driven. A SAX parser will read an XML sequentially from the beginning. Each XML tag it encounters is regarded as an event, and with every event it encounters, it will consult a "list" it has been provided (by you the programmer) for this particular job — and take whatever action is necessary. A common example of SAX in action is in parsing an RSS news feed (see Kevin Yank’s PHP and XML: Parsing RSS 1.0 article – there will be plenty of other examples in most programming languages used for work online).

The DOM approach says "I’m going to load the entire XML document in one go, then make it available to you in a "hierarchical" form, building a "tree" from the document where individual elements and be accessed directly, without having to "re-read" from the the beginning. The DOM approach is object oriented, hence the name. If you have experience with object oriented programming, the DOM API will be instantly appealing. For those unaquainted with object orientation, it may take some getting used to. A loose analogy might be an online directory like dmoz. Dmoz is organised around a tree structure. It’s possible to access parts of the tree directly, using URLs, for example http://dmoz.org/News/ gives the the News branch and http://dmoz.org/Science/ gives you Science. If DMOZ was only a single page — a giant list organised under headings, you’d need to read through the the section you’re looking for.

In general, the SAX approach is usually faster, arguably easier to use, and better suited to large documents, while the DOM approach provides you with a more powerful way to manipulate XML, and can be very useful in creating XML from within an application. But DOM loads the entire document into memory and is therefore slower and suited only to small to medium sized documents.

Neither SAX nor DOM is perfect so it’s worth mentioning that a third approach for XML parsing called XOM is in progress in an attempt to make the perfect API. It’s only a few months old, so there’s obviously a way to go yet.

To add to the confusion, a fair few parsers that implement the SAX or DOM APIs are available, such as James Clark’s now famous Expat Parser (a SAX parser) and the Gnome libxml parser (a DOM parser). Microsoft implement SAX and DOM, along with a number of other important XML technologies, within a toolkit known simply as MSXML, while Sun have XML encoding and decoding classes in the Java library, which you’ll find out more about here.

Whichever XML parsing API you use, your programming language of choice will need to implement the API in some form (otherwise, prepare to write a lot of code or pray someone has done it for you such as Luis Argerich and PHP XML Classes). PHP provides you with SAX functions and DOM functions as extensions, which need to be added to your base PHP install.

XML in Action

Now you’re up to speed with XML in general, what are these acronyms that keep popping up like XML Schema and XSLT? And what can you use XML for, anyway?

I’ll give you a quick run down of the important "add-ons" and applications XML is now used for, some of which will be discussed in detail in other articles, as part of SitePoint’s XML Week.

XML Validation

We’ve seen how to create well formed XML documents by obeying the basic rules of the W3 standard. What we haven’t seen is how to validate the structure of an XML document and the data within it. If you receive an XML document from someone else, it may well be well formed — but is the document actually relevant to the situation at hand? In HTML terms, the example below is well formed but it won’t amuse many Web browsers.

<head>      
 <body>      
   <title>      
     <tr>      
       <td>This is almost XML!</td>      
     </tr>      
   </title>      
 </body>      
 <html>      
   <table> Almost an XML Page </table>      
 </html>      
</head>

The first technology to deal with this was the DTD, part of the original XML specification. DTD’s provide a means to validate the structure of an XML document, as well as providing information as to how the document should be processed and the ability to define custom entities. What DTD’s don’t provide is a means to validate the data stored in an XML tag. DTD’s are also not themselves XML. As a result, they are gradually being superceded by a new approach called Schemas.

There are two main (alternative) standards for validating XML; the W3’s XML Schema and Relax NG. Although Relax NG is arguably easier to use, the W3 XML Schema standard seems to be gaining greater support as the de facto XML validation technology. Both of these provide the ability to validate not just the structure of an XML document, but also the data stored in it — and go so far as to provide regular expressions for the purpose.

XHTML

Simply put, XHTML is just HTML that conforms to the rules of well formed XML. In other words, it’s <img src="myimage.png" /> from now on.

XML Namespaces

XML namespaces are used to create a single document with multiple (and potentially overlapping) XML tag sets. They allow an XML parser to identify which set a tag belongs to, and as a technology, this is particularly important to SOAP-based Web services, XML Schema and XSLT (see below).

XSL(T)

XML Stylesheets are a mechanism that allow an XML document to have a set of "rules" applied to it, based on tag name, in much the same way as Cascading Stylesheets allow you to specify a particular "look" for a particular HTML tag. XML Stylesheets get most interesting when used in transformations (XSLT). XSLT can be used to transform an XML document into another XML document, and represents a very important technology for Web designers and developers to learn.

Given some XML, you could use XSLT to convert it to any of XHTML, WML (Wireless Markup Language), SOAP, XML-RPC, SVG (see below) or even Flash and PDF (with the aide of a little PHP or otherwise) files. That means you can deliver multiple user interfaces from a single data source…

You could also use XSLT to transform between SOAP and XML-RPC (two of the Web services standards).

XPath

XPath is a mechanism for navigating an XML tree using a syntax that’s similar to the command line access of a file system in DOS or a UNIX shell. XPath is important for XSLT.

XLink

Allows you to place HTML-like links in XML documents that take advantage of XML namespaces. XLink may have flopped as a technology, perhaps given that most people would rather transform XML into XHTML, rather than read raw material itself. XLink is still significant though, as a step towards XPointer…

XPointer

XPointer takes advantage of XPath and XLink to provide a mechanism to point to a specific tag or range within another XML document. As XML evolves from being simply a markup language to something that resembles a programming language, and for use in XML storage (XML databases) XPointer is likely to become more important. It’s one for the developers to watch.

Web Services

Web services have been examined at SitePoint in Web Services Demystified and Build your own Web Service with PHP and XML- RPC. XML-RPC was the "original" XML messaging standard, with another early alternative being WDDX. The current king of Web services is the SOAP XML standard.

WML

Wireless Markup Language is an XML standard for wireless devices developed by a consortium of mobile device vendors. More information at http://www.wapforum.org/ (basically the XHTML equivalent for WAP/3rd Generation mobile devices).

SVG

Scalable Vector Graphics allow you to draw graphics using XML, relying on the Web browser to display the end result, including animation. There’s a collection of demos here, including a map of Vienna and an animated chess game. For PHP developers there’s an excellent SVG class library here, along with an SVG Graphing Tool.

XML Elsewhere

Other XML applications have been built for all kinds of things, from B2B ecommerce with ebXML to Mind Reading Markup Language (I kid you not). The the most comprehensive list I’ve come across is at Oasis: http://xml.coverpages.org/xmlApplications.html (prepare to fall over backwards reading that), or if you don’t like lists, try this poster.

A particularily interesting area of development right now is in the arena of XML data storage (databases). The "traditional" database vendors like Microsoft and Oracle have already climed aborad the bandwagon, adding XML support to their products. Meanwhile a whole new generation of databases in is progress, such as Apache’s Xindice. This area of database development is still "work in progress" but, in theory, XML presents a better alternative to SQL for "exploring" and relating data, as well as data types you won’t see in most databases. It also offers potential benefits such as native database Web services and "document based" data storage, perhaps making user interface generation an easier task. There’s a more in-depth discussion here.

Otherwise, a good place to get un-hyped and accurate XML news is xmlhack.

Tagged and Bagged

Hopefully you now have some idea of what XML is about and why it’s important to all of us building the Web. Do I hear you saying "It’s so easy!"? If not, please feel free to drop by the SitePointForums discussion below with any questions. For the rest of SitePoint’s XML week you’ll be treated to an in-depth view of some of advanced XML technologies and application. Enjoy!

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

No Reader comments