XML – An Introduction

Today, we’re deeply ensnared in the World Wide Web. Forests of techspeak, along with metaphors like "information superhighway", "global village" and "Infobahn" litter our conversations. Offline, you’re off-life, standing still as the information age hurtles past. And so the moral of the day is "Get online: there’s gold in those nets. You never know what you might catch!"

This pounding rain of hype has been a good incentive. From a few academics and hardcore hobbyists, Internet growth has gone nuclear. 300 million have moved online; 55 million use the Web regularly. And HTML – HyperText Markup Language – is responsible. HTML spins ordinary data into a World Wide Web, touching anyone, anywhere, anytime. So now, the global reach that was once accessible only through expensive media channels, is available to the masses.

Today’s HTML has advanced far beyond its first humble steps. But its virtues now shackle it; HTML can no longer move forward. So its makers have created a new language: XML.

What is XML? To answer this, we must know:

  • Why HTML worked
  • Why HTML no longer works
  • Why XML will pick up where HTML left off

Why HTML Worked

HTML outlines hypertext structure. Ideally, hypertext is data that follows a path imposed by user whim, linked and experienced independent of where it, and its user, are located. Though the Web hasn’t reached this ideal, and perhaps never will, HTML’s design grasps for it, addressing three key concerns in data delivery:

  1. Linking: Data is linked in HTML – one piece carries you to another.

  2. Simplicity: HTML is simple and easy to learn.

  3. Portability: HTML is stripped down, so it’s portable – especially over networks.

Each of these elements, when hard-wired into data by commands called markup, make hypertext. Markup says this is a paragraph <P>, this is a picture <IMG SRC="picture.gif">, and this is a link <A HREF="link.html">Link</A>.

For example, take a line like:

Dick likes Jane. Run, Jane, run.

Then add markup:

<A HREF="dick.html">Dick</A> likes <A HREF="jane.html">Jane</A>. <A HREF="run_jane.html">Run, Jane, run.</A>

And you get:

Dick likes Jane. Run, Jane, run.
[Note: these particular links don't actually go anywhere]

Anyone interested in Dick and Jane can then follow a link to more information on Dick, or Jane, or on where Jane might run.

HTML was the first means of targeted data transfer to so many people so easily. Radio and TV unleashed a lot of data, but the flood was indiscriminate. Computers allowed greater user interaction, but were limited by their location and multi-platform inconsistencies. Only HTML allowed data to transcend the twin tyrannies of distance and incompatibility.

But then, the cracks began to show…

Why HTML No Longer Works

HTML markup is fixed. The need for linkage, simplicity, and portability imposes limits on markup. And that’s fine, if all you want is linkage, simplicity, and portability. However, if you want more, you have a problem. HTML is limited in:

  • Intelligence – How well data knows itself
  • Adaptation – How well data changes in response to changing times
  • Maintenance – How easily data is cared for

Some intelligence is present in HTML. It knows that this is a paragraph, and this is a picture. But it doesn’t know the paragraph is about Dick and Jane and the picture is of Brown Puppy. It focuses on basics, not specifics. Dick and Jane can’t be torn from Brown Puppy if Brown Puppy is all you want. You get everything in one swallow: Power over intelligence lost.

You want more than HTML’s limited markup. You want <DONUT> markup. You want <FILLED>, <SPRINKLES>, <FROSTED>, and <GLAZED> tags. But HTML won’t give them to you. You could submit <DONUT> markup for exhaustive standards approval. Maybe you’ll get it. If not, you’re out of luck: Power over adaptation lost.

HTML throws everything in one bundle. It’s hard to find and change the exact markup you want. Markup for look and linking get mixed in with data, without a clear division. Change the look, and your links may be lost. Change the links, and you might lose the look. Separate markup for links, look, and data don’t exist. To change something you change everything: Power over maintenance lost.

Is there a solution? Yes. XML.

Why XML Will Pick Up Where HTML Left Off

Extensible Markup Language allows specific markup to be created for specific data. It has the virtues of HTML without any of its limitations. XML is strong in:

  • Intelligence

  • Adaptation
  • Maintenance
  • Linking
  • Simplicity
  • Portability

XML is intelligent to any level of complexity. Markup can be wrapped within markup, from general markup such as:

<DOG>Lassie</DOG>

to more specific markup like:

<DOG><COME_HOME><SCOTTISH>Lassie</SCOTTISH></COME_HOME></DOG>

Data can be so finely marked up that:

<SEEING_TWO>double</SEEING_TWO> and:
<MORE_LIQUOR>double</MORE_LIQUOR>

become infinitely separate values. The information knows itself.

XML is also a mother tongue for other languages, so languages like DickML and JaneML become possible. Adaptation is infinite. Custom markup can be created for any need. If markup describing the varying degrees of lumpiness in gravy is required, it can be made. No more fixed markup that limits the categorizing instincts of the masses.

XML is easy to maintain. It contains only data and markup. Look comes from its own stylesheet, and links are also separate, not buried in the document. Each can be maintained independently – no more wading through a markup mess.

XML uses one way to link, which embraces all ways to link. Not only that, it links in ways that HTML can’t. HTML can do simple, one-way links inside or outside of data. In addition to this capability, XML can link two or more points inside or outside of data. There are even super-links intertwining all data within itself. Any link between any data can be handled.

XML is simple. The average user may disagree: compared to HTML, XML is more complex. But compared to other languages that achieve the same results, XML is simplicity itself. Unnecessary overhead has been stripped out, leaving only the essentials. XML gets to the point.

XML carries well. Its reasons for existence are power and portability. All a browser needs in order to view XML is the data itself, and the stylesheet that controls its look. If stricter validation is required, a description that lists its exact meaning can be used, with only slightly more overhead.

The solution is here…

The XML Vision

The vision for XML, in the words of its creators, is as follows:

"XML shall be easily usable over the Internet."
It’s designed to use proven methods, and to require little retooling to make it a vital part of the Web. Adoption should be as painless as possible.

"XML shall support a wide variety of applications."
HTML’s reach is widespread. It’s independent of hardware and software. People with different platforms can access the same HTML using hundreds of different programs. XML must support a similar, or greater, range of uses and programs to be successful.

"It should be easy to write programs that process XML documents."
If you don’t build it, they won’t come. The test of a computer’s usefulness is how much you can do with it, and this rule applies to XML too. If no one uses a language, it dies. Making it as easy as possible for programs to use XML is essential for its success.

"XML documents should be human-legible."
You should be able to read over a raw XML document and understand it. It may even be easier to understand than normal English. "Jack and Jill went up the hill" marked up as <NAME>Jack</NAME> and <NAME>Jill</NAME> went <DIRECTION>up</DIRECTION> the <GEOLOGIC_FORMATION>hill</GEOLOGIC_FORMATION> may be even more self-explanatory than its unmarked form.

"The XML design should be prepared quickly."
The XML 1.0 specification was released in February 1998, with the second edition appearing in October 2000.

"The design of XML shall be formal and concise."
Instead of often-obscure markup like <P>, <BODY VLINK="#0000ff">, or <TR>, XML is spelled out:

<P> would be <PARAGRAPH>
<BODY VLINK="#0000ff"> would be <BODY VISITEDLINK="#0000ff">
<TR> would be <TABLEROW>

This makes XML easier to understand and compose.

"XML documents shall be easy to create."
XML documents are easy. If you can type <, >, and /, and you can remember that for every opening <MARKUP> there needs to be a closing </MARKUP> you can write "well-formed" XML. Complex or "valid" XML is more difficult, but it’s still not overwhelming.

"Terseness in XML markup is of minimal importance."
HTML has a lot of shortened markup. The meaning of <IMG SRC> may be recognizable, but if it’s spelled out as <IMAGE SOURCE> it’s even clearer. XML seeks to make its markup clear, as opposed to short, across the board.

This is the vision for XML. But XML won’t just supplement the Web: it will be the Web. Visions often fail when they try to be all things to all people, but the vision for XML won’t. Its job is allowing visions to be all things to all people.

Win an Annual Membership to Learnable,

SitePoint's Learning Platform

No Reader comments

Comments on this post are closed.