XHTML – An Introduction

HTML as we know and love it, has for many reasons become rather lax and unruly. If you’ve diligently tested your Web pages on different browsers only to find that your carefully crafted masterpiece looks great in IE5x, but becomes an illegible monster in Netscape 4x, then welcome to the club.

What can we do?

Well, we could spend all of our time whining about browser conformity, proprietary tags and standards. Or, we could take a pro-active stance and support the World Wide Web Consortium’s first recommendation for XHTML: XHTML 1.0

This article takes a "quick start" approach aimed at the HTML author who wants to further their skills, and it concludes with links to more detailed information.

What is XHTML?

According to the W3C:

"XHTML 1.0 is a reformulation of HTML 4.01 in XML, and combines the strength of HTML 4 with the power of XML.

"XHTML 1.0 is the first major change to HTML since HTML 4.0 was released in 1997. It brings the rigor of XML to Web pages and is the keystone in W3C’s work to create standards that provide richer Web pages on an ever increasing range of browser platforms including cell phones, televisions, cars, wallet sized wireless communicators, kiosks, and desktops."

Sound good so far? Then read on…

Where do you start?

If you’ve no experience with XML you could be forgiven for being a little intimidated by it. But if you can code your pages in HTML, you’ll be pleased to know that learning XHTML will be extremely easy. It will also provide you with a superb introduction to XML along the way.

Essentially XHTML is just a stricter version of HTML 4.01 with a few considerations that you should be aware of as you mark up your pages.

Three flavors to choose from!

As you may know, the eXtensible Markup Language is not a markup language at all, but a way of defining markup languages by use of a Document Type Definition, or DTD. XHTML is one such language and there are three different DTDs to choose from.

  • Strict – disallows use of all deprecated tags and attributes such as the <font> tag.
  • Transitional – is far more forgiving and supports all those deprecated yet browser supported tags you most likely use every day.
  • Frameset – is exactly the same as the transitional DTD but replaces the document body with frame attributes.

You’ll probably want to use the transitional DTD as it provides the most forgiving environment for an introduction to XML and XHTML.

The main differences between HTML and XHTML

The specification requires that your documents be "well formed", which means that you have to pay special attention to certain aspects of your code. Below are the key points you need to be aware of.

1. Nested elements

Firstly you need to tidy up the way you treat your page elements. XHTML does not tolerate incorrect nesting so something like this:

  <b><p>I'd probably have gotten away  with it too if it weren't for 
       you pesky W3C folks</b></p>

won’t pass muster at the W3C’s Validation service but

  <p><b>Buffy rules!</b></p>

…will be just fine. The same applies to all your markup tags.

2. Case Sensitivity

Both tags and their attributes are case sensitive in XHTML. The simple and strict rule is that all tags and attributes must be written in lower case. For example,

<A HREF="myPage.html">Some page</a>

will get you roasted alive by the XHTML Validator, but

<a href="wellFormed.html">Well formed page</a>

will work perfectly.

End Tags

Most HTML designers leave out the end tags to certain elements such as </p> If you didn’t know <p> even had an end tag, you’re not alone. Here are the tags most likely to catch you out: <th> <tr> <td> <li>

What about images and line breaks?

Good question. These elements are similar, and all require an end tag. That's the way XML works, and of course XHTML is no exception even if there is no end tag in the HTML equivalent. You deal with this by including the end tag in its opener. Here's an example:

  <p>XHTML is strict but not really hard</p>  
 <img src="somePic.gif" /><br />  
 <p>See what I mean?</p>  
 <hr />

The trick is to leave a space before the closing tag so as not to confuse non-XHTML browsers.

Attributes

There are a couple of things you should be aware of when you're dealing with attributes. The first is that all your attributes must be enclosed within "double-quotes".

The second is that for those attributes that in HTML have no value such as <ul compact> you must specify one. It's done like this:

<ul compact="compact">

Other attributes to watch for are:

ismap="ismap"  
declare="declare"  
nowrap="nowrap"  
compact="compact"  
noshade="noshade"  
checked="checked"
Special characters

I hate to say it, but this is the point where XHTML becomes a bit of a pain. Most of the above is just a matter of disciplining yourself and developing good coding habits, but there are a few problems here that require special mention. They'll almost certainly cause you trouble if you're unprepared!

  1. XHTML can be a little problematic in its handling of <, > and & characters in CSS and JavaScript. XML browsers may remove your comments and thus your commented CSS. Use External stylesheets and JS scripts to be certain (although I've had no problems so far and do not do this on my own site).
  2. Ampersands can be a problem within attributes as well. As a general rule of thumb, just use the corresponding HTML entities for &, <, > characters and make sure that you validate your pages properly.
Use id instead of name

The name attribute is now deprecated in favour of the new and prefered id attribute. Although it's supported, you'll get warnings when it comes to validation if you use name on a map tag, for instance.

A simple XHTML document

Okay, enough with the do's and don'ts. If you're eager to get going here's a simple XHTML document to get you started.

<?xml version="1.0" encoding="UTF-8"?>  
<!DOCTYPE html PUBLIC  
         "-//W3C//DTD XHTML 1.0 Transitional//EN"  
         "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">  
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">  
<head>  
<title>XHTML is easier than you thought!</title>  
</head>  
<body>  
<hr />  
<p>As long as you remember the rules and guidelines  
above<br />  
you'll soon be writing well formed documents.  
No really, you will! </p>  
<hr />  
</body>  
</html>

A detailed explanation of the declarations at the top of the document is beyond the intended scope of this "quick start" guide (and quite unneccessary for most designers), but here's the simple version.

Lines 1 Tells the browser that we're using XML 1.0 and gives its encoding as 8-bit Unicode.

Line 2-4 States the DTD we're using, which in this case is the transitional version.

Line 5 Declares in the <html> tag the XHTML name space and language attributes.

And there you have it, you're all set to start writing well-formed, standards-compliant XHTML pages! All you need do is use the code above as your basic template and start getting into good coding habits from the outset. If you find that you can't validate every page on your site properly then don't worry, it's a pretty tough call. As long as you're making an effort to validate as much as possible you're doing a good job.

Further information and resources

Official sources

XHTML 1.0: The Extensible HyperText Markup Language

W3C HTML Validation Service

HTML Home Page

Tutorials and articles

w3schools.com: XHTML School

An XHTML Roadmap for Designers

Webmonkey.com: XHTML Overview

Good luck!

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

No Reader comments

Comments on this post are closed.