HTML 4 Considered Harmful

A cat in a washing machine

There are times when I feel like I’m banging my head against a brick wall. Or maybe I should say, times when I feel like the drum I’m banging is being pounded with bricks!

I’ve been advocating the use of XHTML for years, and although I’m not at all sorry that XHTML 2 is dead (because it was utterly divorced from reality), I am extremely sorry that so many developers have regressed back to using HTML 4. I’m equally sorry that a good proportion of those forward-thinking developers who have already starting marking-up their content with HTML 5, are doing so using HTML 4 syntax.

There are many benefits to XHTML which are equally true of Strict HTML, like the removal of presentational markup, and the consistent quoting of attributes, to give two examples. But there’s one benefit of XHTML which belongs to it alone, and thats XML syntax. The benefit of using XML syntax seems to me so significant that I’m frankly staggered at anyone who disputes it.

Don’t get me wrong here, I’m not dismissing as ignorant anyone who doesn’t agree with me. What I’m remarking on is just how incredible it seems to me that such an obviously useful thing as XML well-formedness could pass anyone by. XHTML served as text/html does have advantages over HTML 4, simply because it looks like XML.

Okay, so it isn’t really XML. So the self-closing syntax only works in current browsers at all because of their tendency to error correction. So XHTML as text/html is a fudge, and technically incorrect. But that doesn’t matter. What matters is that it looks like XML, and to any XML parser that can parse from a string, there’s no difference at all.

Here’s a case in point — recently I needed to find a way of creating a DOM from responseText HTML. I couldn’t get responseXML because the markup was out of my control (and anyway, it wasn’t XML), and I couldn’t use the “document.write to an iframe” trick because the implementation didn’t support that. So I used DOMParser (which works in Firefox, Opera and Safari):

var dom = new DOMParser().parseFromString(request.responseText);

And that worked fine. But it only worked for well-formed XHTML documents. Why? Because they look like XML! It failed on HTML 4 documents, because they don’t.

So there’s a simple example of how beneficial it is to develop web pages using well-formed XHTML, SynBay, irrespective of the mime-type used to deliver it. Markup that looks like XML can be parsed as XML, whether or not it actually is.

I mean really. What more convincing could anyone need? I just don’t get it.