HTML 4 Considered Harmful

A cat in a washing machine

There are times when I feel like I’m banging my head against a brick wall. Or maybe I should say, times when I feel like the drum I’m banging is being pounded with bricks!

I’ve been advocating the use of XHTML for years, and although I’m not at all sorry that XHTML 2 is dead (because it was utterly divorced from reality), I am extremely sorry that so many developers have regressed back to using HTML 4. I’m equally sorry that a good proportion of those forward-thinking developers who have already starting marking-up their content with HTML 5, are doing so using HTML 4 syntax.

There are many benefits to XHTML which are equally true of Strict HTML, like the removal of presentational markup, and the consistent quoting of attributes, to give two examples. But there’s one benefit of XHTML which belongs to it alone, and thats XML syntax. The benefit of using XML syntax seems to me so significant that I’m frankly staggered at anyone who disputes it.

Don’t get me wrong here, I’m not dismissing as ignorant anyone who doesn’t agree with me. What I’m remarking on is just how incredible it seems to me that such an obviously useful thing as XML well-formedness could pass anyone by. XHTML served as text/html does have advantages over HTML 4, simply because it looks like XML.

Okay, so it isn’t really XML. So the self-closing syntax only works in current browsers at all because of their tendency to error correction. So XHTML as text/html is a fudge, and technically incorrect. But that doesn’t matter. What matters is that it looks like XML, and to any XML parser that can parse from a string, there’s no difference at all.

Here’s a case in point — recently I needed to find a way of creating a DOM from responseText HTML. I couldn’t get responseXML because the markup was out of my control (and anyway, it wasn’t XML), and I couldn’t use the “document.write to an iframe” trick because the implementation didn’t support that. So I used DOMParser (which works in Firefox, Opera and Safari):

var dom = new DOMParser().parseFromString(request.responseText);

And that worked fine. But it only worked for well-formed XHTML documents. Why? Because they look like XML! It failed on HTML 4 documents, because they don’t.

So there’s a simple example of how beneficial it is to develop web pages using well-formed XHTML, SynBay, irrespective of the mime-type used to deliver it. Markup that looks like XML can be parsed as XML, whether or not it actually is.

I mean really. What more convincing could anyone need? I just don’t get it.

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • Russ Weakley

    Hey James,
    Some interesting phrases used here to state your position on using XHTML:
    1. “because it LOOKS like XML”
    2. “XHTML as text/html is a fudge, and technically incorrect. But that doesn’t matter…”

    What I really want to know is… it may look like XML but does it taste and smell like XML? :)

  • http://www.tyssendesign.com.au Tyssen

    What more convincing could anyone need?

    Well for me personally, I’d need better arguments why having markup that looks like XML is important. You’ve given one but I’ve never come across the need to do what you’ve described in my work so for me, it’s a non-issue.

  • Anonymous

    var html = ‘<html><head><title>html4</title></head><body><p>considered just fine</p></body></html>';
    var dom = new DOMParser().parseFromString(html, ‘text/xml’); // no errors…

    console.log(dom.childNodes[0].childNodes); // output: [head, body]

    console.log(dom.getElementsByTagName(‘p’)[0].childNodes[0]); // output: <TextNode textContent=”considered just fine”>

    Works great in FF3.5, can’t say how widespread the success is because I just tested it right now. The only issue I personally have with it is you have to specify “text/xml” even though you’re clearly not passing XML. Certainly those who don’t mind serving XHTML as HTML should be fine with this method, though.

  • http://bitdepth.wordpress.com/ mmj

    So correct me if I’m wrong – the short version of this blog post is:

    XHTML is more useful than HTML, even when served as text/html, because it can be parsed using DOMParser (which only works in Firefox, Opera and Safari).

  • http://www.magain.com/ Matthew Magain

    I’m with James, but not for the same reasons. I just think that the symmetry of XHTML means it’s easier for beginners to learn. I’m not a beginner anymore, so maybe this is not true. It just feels like it would be.

  • http://bitdepth.wordpress.com/ mmj

    I’m with James, but not for the same reasons. I just think that the symmetry of XHTML means it’s easier for beginners to learn. I’m not a beginner anymore, so maybe this is not true. It just feels like it would be.

    That is certainly true, and this inconsistency was the inspiration for XHTML in the first place – what makes it easier to learn also makes it easier to write a parser for.

    Unfortunately when serving as text/html, you still must self-close EMPTY elements (<img /> not <img></img>), and must not self-close non-EMPTY elements (<p></p> not <p />), but if the world had gone the way of XHTML and particularly XHTML > 1.0, then even this inconsistency would not have been a problem anymore. Unfortunately, this didn’t happen and XHTML on the web still has the ‘must look sort of like HTML’ problem.

    Certainly, XHTML removes the confusion which may be caused by elements with optional opening or closing tags, though of course it is possible to code HTML and treat all tags as necessary – HTML still gives you this option.

  • http://www.dmgx.com Michael Morris

    Sorry but no – xhtml remains useless in my mind as long as IE needs text/html headers to display it correctly. Besides, XHTML is not supposed to support the innerHTML property, and most of the major javascript libraries (prototype, jQuery) rely heavily on innerHTML to drive their DOM manipulations.

    I’d rather have a valid HTML 4.01 Strict document than a XHTML tag soup document served under invalid headers. Maybe I’m just pedantic though.

  • http://www.mikehealy.com.au cranial-bore

    Would a “well formed” HTML 4 Strict document (e.g quoted attributes, closing <p> and <li> tags) work the way you wanted James? (parseXML-able)

  • yukster

    I love how all the xhtml bashers poo-poo the merits of xml while trumpeting what a travesty it is that xhtml (which is valid xml) could be served as text/html. WHO ****ING CARES? The browser manufacturers have consistently ****ed up working html, xhtml, css, and javascript. To this day, there are still plenty of gotchas to fritter my days away on.

    The single greatest mistake of the short, painful history of the world wide web was giving people any flexibility. There should be one way to do it and it should simply not work (preferably with a helpful error message) if you don’t do it that way. New features or abilities should be ruthlessly rejected by client writers until they are agreed on and standardized (yeah, I know the standards process is horribly slow, but don’t get me started about greedy, control-freak humans).

    Anyway, all anyone can talk about in the xthml vs html thing is how nice it is to close tags (though I agree with that… it’s a ****ing TREE… get over it) or that it is simply unacceptable that we make our servers lie about what we’re serving because M$ ****heads are too stupid (or too evil) to handle it correctly. What no one ever talks about is the great promise land that was lost when the browser nazis destroyed the W3C. Ok, ok, I have to say that I didn’t agree with the direction of XHTML2. Too cerebral; too scientific.

    But 10 years ago we were all a-twitter (no, not that kind of twitter) with the possibility of serving our xhmtl documents with SVG and SMIL and other XML dialects right in the document…. and it would all just work!! We were also sure that we would be completely done with this browser stuff by now. Our cell phones, our mobile devices, our cars, our refrigerators, hell our fricking houses would all be sending and receiving wonderful, easy to read, universally parseable XM-****ing-L.

    But one little multi-billion-dollar corporation ruined that wonderful dream. Most who actually understand this say it was out of incompetence. I think there was malice (probably in cahoots behind the scenes with the other rabid XML haters, Google). But whatever, it’s done. <voice type=”robot>I FOR ONE WELCOME MY GREAT HTML5 OVERLORDS… ALL PRAISE ULTIMATE FLEXIBILITY [more work for the browser writes; ed.] ALL PRAISE COOL MEDIA FEATURES WE COULD HAVE DONE TEN YEARS AGO IF THE BROWSER OVERLORDS HAD GOTTEN OFF THEIR FAT ASSES… YOU WILL BE ASSIMILATED</voice>.

    Carry on……..

  • jacksonk0608

    There are many benefits to XHTML … like the removal of presentational markup, and the consistent quoting of attributes, … one benefit of XHTML which belongs to it alone, and thats XML syntax.

    Is this to say that, mostly, one forms good habits in learning to write XHTML?

  • http://autisticcuckoo.net/ AutisticCuckoo

    I won’t reiterate my arguments for why real XHTML is inappropriate for most web documents and why pretend-XHTML is silly, because I know I won’t sway James any more than his arguments sway me. :)

    But as mmj said, it appears as if the gist of this article is that pretend-XHTML is a Good Thing™ because it can be parsed as XML by built-in browser functions in ajax-type applications. (That would be in violation of RFC 2854, but never mind.)

    Since browsers have to be able to parse HTML, and even horrible tag soup that bears only a minor resemblance to HTML, it shouldn’t be any harder for them to provide a JavaScript object for parsing HTML than one for parsing XML.

    So instead of flogging the pretend-XHTML horse, which isn’t only dead but has even started to smell a bit, why not lobby the browser vendors to provide an HTML-to-DOM API?

  • http://www.brianswebdesign.com skunkbad

    I am an XHTML guy myself, but I’m starting not to care anymore. I mean, how many hours have I wasted trying to validate code, follow “web standards”, and for what? Customers don’t care. All they want is a website, and if it looks like what they want, and it functions how they want it to function, they really don’t care if we impress them with our fancy talk of XHTML, HTML, accessibility, usability, etc., etc. I think if you are passionate about making websites, then these things matter to some degree, but if you just want to make money, and your sick of being the only one that cares, then it’s time to just do whatever works. Yes, some customers will always prefer quality over price, but in my experience, these are few in number. Maybe I just attract poor people?

  • http://www.thinkcolony.com Richard Conyard

    Why do I see this thread splitting into two; between those who have had this argument time and again and have no venom left and now agree to disagree; and those that are going to go through it again throwing in plenty of expletives to boot.

    Part of the success of the web is because it was made easy to publish; hells bells even my mum could knock together a HTML page in the late 90’s. Well formedness, n’ah she wouldn’t have a clue about that, character escaping, no chance. However knocking up a quick page, probably using some of the browser quirks as additional design features, no worries.

    Now I’m the other side of the fence. The DOM I’m using is almost certainly in C# or PHP. I don’t care whether it’s served as text/html or application/xhtml+xml or even text/plain, but I do like well formedness and it does make my life easier (XML does in general).

    HTML was opened up so that the lowest common denominator could take part, it’s part of it’s success and even though I’d rather have a web of XML, but that is above the LCD skill set and I can’t see any point moaning about it now.

  • http://keryx.se itpastorn

    This article in one line:

    Fake XHTML makes perfect sense used as a coding convention.

    I concur, especially in teaching situations. Disclosure: I am a teacher.

  • palgrave

    @sitepoint Don’t want to sound like a prude, but I one of the things I like about this site is the respectability of arguments presented. Any chance you can **** out the effing in future?

  • http://www.cemerson.co.uk Stormrider

    The title of this article is a bit misleading… I know it’s a reference to the ‘XHTML considered harmful’ document, but this says nothing about why HTML 4 is bad, but only gives one (very marginal) benefit of XHTML.

    Would a “well formed” HTML 4 Strict document (e.g quoted attributes, closing <p> and <li> tags) work the way you wanted James? (parseXML-able)

    No, because tags like <img> would not be accounted for, and would be not well formed in the XML sense.

    I just don’t see why being able to parse a document as XML is any use to anything but a screen scraper stealing data from other sites really. There is nothing stopping you returning and using XML in the response to AJAX calls still, so what’s the benefit? Why would I want to parse my page as XML?

    I used to like the strictness of XHTML and the good habits it got me into, but there is no reason you can’t do this with HTML4 either if you wanted. It’s not like browsers complain at a tag in an XHTML document being uppercase for example, so no extra strictness in all practical situations, only in the spec.

  • http://www.optimalworks.net/ Craig Buckler

    Hear, hear James! The last few years has seen a backlash against XHTML primarily because of the MIME-type and IE compatibility issues. Developers started to switch back to HTML, changed tag cases, and dropped closing tags. Were there any real-world benefits to doing that?

    At least XML syntax is neat, well-formed, easier to maintain, and easier to read (by humans and machines). Would JavaScript coders drop indentation and end-of-line semi-colons just because they’re not strictly necessary? Browsers and standards support may be a messy, but it doesn’t mean we have to be!

  • http://www.cemerson.co.uk Stormrider

    Developers started to switch back to HTML, changed tag cases, and dropped closing tags

    Says who? Anyone I know who switched back to HTML keeps the closing tags and lowercase tags, and proper attribute quoting etc. Why can’t you be not-messy using HTML? In fact, what stops you being messy when using XHTML? Absolutely nothing, the browser will still parse uppercase tags in (fake) XHTML, and non-quoted attributes etc. The strictness is only in the spec, and has absolutely no basis or advantage in any practical situation.

    Here, here James!

    Do you mean ‘hear, hear’? Unless you actually want James to come to you :P

  • Stevie D

    I use HTML Strict, and while I follow most XHTML rules – I always close optional tags like <p> and <li>, I write all tags in lowercase, I quote all attribute values – I draw the line at mongrel abominations like <br />. Spot elements that have no content – br, img, link and so on – do not need to be closed, and the hash that allows you to close them in both HTML and XHTML is ugly as heck and is no more than a dirty hack that abuses a loophole in the HTML spec.

  • http://xslt2processor.sourceforge.net boen_robot

    I completely agree with James on this. This is the one reason for which I won’t switch back to HTML 4 Strict.

    Perhaps giving an example with the browser – an HTML aware environment – was a bad one to illustrate the point though. So consider other places – Cold Fusion, JAVA (including JSP), .NET (including ASP.NET), PHP… – all of those environments have XML parsers. None has an HTML parser (except PHP, but its HTML parser is far from being reliable).

    Imagine the following scenario – you need to fetch an external page, that has a markup that is out of your control. You want to find (for the sake of example) the first link in it, and get it’s address. If the remote document is XHTML (regardless of whether it’s served as text/html or application/xhtml+xml), and you have PHP, you can do it like this:
    <?php
    $dom = new DOMDocument;
    $dom->load('http://example.com/page.html');
    $xpath = new DOMXPath($dom);
    echo $xpath->query('//*[local-name() = "a"][1]')->item(0)->nodeValue;
    ?>

    or (for this particular case) this:
    <?php
    $dom = new DOMDocument;
    $dom->load('http://example.com/page.html');
    echo $dom->getElementsByTagNameNS('*', 'a')->item(0)->nodeValue;
    ?>

    Try doing that with an HTML document (and Anonymous… make sure it includes a self closing element like “link” or “img” please, while still being a valid HTML 4). Try fetching something more specific. Try feching multiple things (e.g. all images, the first link, the location of the first screen stylesheet, etc.)… The only way you can possibly do it is with a very long, complicated, and therefore error prone regular expression. Actually, that’s what “parsers” are for – avoiding manual searching with regular expressions, and replacing them with a program that knows its stuff.

    The point is not as mmj put it “XHTML is more useful than HTML, even when served as text/html, because it can be parsed using DOMParser (which only works in Firefox, Opera and Safari).”. It’s because any XML parser in any environment can (even today) read XHTML, even if it’s served with the wrong MIME type, because they only read the text – they don’t care for MIME types. Only rendering engines do.

    If you use XHTML at your sites, you make them more easily scannable for other programs, which means it will be easier to get into feeds (that is “custom feed readers”… public ones create their error prone HTML parsers for those not using XHTML), potentially higher search engine ranks (though in the case of Google, they index valid HTML pages just as well, because they have their own parser), better behaviour with proxies (assuming they use XML parsers along with the HTML ones – some use only plain regular expression replacements), etc.

  • http://xslt2processor.sourceforge.net boen_robot

    Opps… it may be hard for you to believe me this was an accident (as what I’m about to say only reinforces my point), but I accidently selected the anchor itself, when I was supposed to show selecting its address. The two samples should instead use:
    query('//*[local-name() = "a"][1]/@href')
    and
    getElementsByTagNameNS('*', 'a')->getAttribute('href');
    respectively for the first and second example.

    If I was using regular expressons, finding this error probably wasn’t going to happen so soon.

  • Anonymous

    Weak blog post, with really weak argument for xhtml.

  • Jonny Axelsson

    HTML4 is harmful or the HTML serialisation? The title and the entry don’t seem to talk about the same thing.

    I would say that HTML4 was ahead of its time and that it wasn’t fully worked through. Better than most specs I would give the verdict “not half-bad”, even “pretty good, really”. Over time it has changed the HTML format for the better, given that the alternatives were HTML 3.2, Netscape HTML and IE HTML.

    HTML5 offers two serialisations mapping to the same DOM. If XML has more to offer the XHTML will win out, otherwise the HTML one will win out. Making XHTML-compatible HTML or XHTML is at minimal loss for computer programs, so we can expect them to do so. This means that HTML is getting closer to the fundamental Internet principle “Be liberal in what you accept, and conservative in what you send.” and that is a very good thing.

    I think part of the backlash is that XHTML has often been mismarketed. The benefit of XHTML1 isn’t that it is stricter than HTML4 (it is not). The benefit of XHTML is that it is XML. That hasn’t been such a big boon in practice until now (an XML system can use HTML as input, or for that matter output).

  • http://www.clearwind.nl peach

    Wow I was starting to think I was the only one hanging on to XHTML strict. I totally agree with the well-formedness argument, I only use and validate my code XHTML strict because it has stronger enforcements on the readability of my code. It’s a benefit to myself and to others editing my code.

  • Anonymous

    With a little discipline you can write HTML code so that the only thing needed to do to make it parse as XHTML is run a quick regular expression over it to close tags like <br> <img> and <link>. Then you can use your XML parser

    You really sounded like someone who picked the loosing side, and must keep trying to pretend you did not back wrong team.

    The idea of including other XHTML documents into the web page directly is one of the few attractive ideas of XHTML to me. Just the need to do that is very low. It could also make them messy/hard to read.

  • http://www.cemerson.co.uk Stormrider

    Wow I was starting to think I was the only one hanging on to XHTML strict. I totally agree with the well-formedness argument, I only use and validate my code XHTML strict because it has stronger enforcements on the readability of my code.

    What enforces it? Browsers certainly don’t.

  • http://david.us-lot.org/ dorward

    There are many benefits to XHTML which are equally true of Strict HTML

    Really?

    like the removal of presentational markup

    XHTML 1.0 Transitional doesn’t exist then?

    and the consistent quoting of attributes, to give two examples.

    No version of HTML 4 enforces a attribute quoting beyond “Use ‘ or ” and even they are optional if the value doesn’t include certain characters”. Strict certainly isn’t different to Transitional here.

  • Can Dederholm

    I think people who obsess over XHTML 1.0 Strict need to go for a walk outside and consider the birds way of life.

  • http://www.accessify.com/default.asp lloydi

    Is it too late for me to open the popcorn?

  • http://xslt2processor.sourceforge.net boen_robot

    @dorward
    I think you’re misunderstand the point being made. HTML 4 doesn’t force you to quote attributes, but if you do, you’re compatible with both HTML 4 and XHTML 1.0, and can thus do the procedure Anonymous describes. As for the XHTML Transitional point… the point was that if you’re a standards aware web developer, you’d use CSS instead of presentational attributes, and that is true regardless of whether your page uses XHTML Transitional, XHTML Strict, XHTML Frameset, HTML 4 Strict or HTML 4 Loose. Yes, Transitional allows you to use presentational attributes, but you may use CSS instead even with Transitional.

    @Anonymous
    Why the replacement overhead to begin with? If it (XHTML that is) works in browsers AND with XML parsers without any particular adjustments, why bother translating one to the other?

    @Stormrider
    What enforces what? Validity and/or well-form-ness? The same thing that “enforces” HTML 4 validity – the W3C validator. If you want, you could also validate it yourself before you even output it, though doing that would be performance costly, obviously. Or you could simply not validate, not follow the rules, and keep using tag soup… those people are reffered to as “not standards aware developers”. Even if XHTML validity was enforced, HTML would still exists, and those same people would still not use it. So the point of “XHTML validity not being enforced” is silly IMHO – if you want to validate, you’d do it with or without enforcement. The fact that newcomers to XHTML won’t have those rules enforced by browsers isn’t a reason for YOU to stop using XHTML.

    @Can Dederholm
    Who says we haven’t done that already and come back ;-)

  • http://www.cemerson.co.uk Stormrider

    Or you could simply not validate, not follow the rules, and keep using tag soup…

    Or you could not validate, but still follow the rules, and not use tag soup.

    The only thing that makes XHTML ‘stricter’ is some writing in a spec somewhere that says ‘you must use lowercase, you must quote attributes’ etc. Why not write these rules on a postit as they apply to HTML, and then the rules will be enforced there too, and you’ll have stricter enforcement of standards there too because a piece of paper said so!

    There is NOTHING stopping you using HTML exactly like XHTML (barring the self-closing tags like etc), the ‘strictness’ of it is just as well enforced, you can easily enforce these standards yourself.

  • http://xslt2processor.sourceforge.net boen_robot

    @Stormrider
    If your pages don’t validate, how can you at the same time “follow the rules, and not use tag soup”? Following your own (combination of existing) rules is one thing. Following the standards’ rules is another thing.

    The HTML spec never specified how should user agents behave when encountering invalid or malformed code. That’s where the whole mess started in the first place, as enforcing such rules today would break millions of web pages worldwide. XHTML was created with XML in mind, and XML specifies how should user agents behave on errors. The new MIME type was created to make it easy for user agents to differentiate between HTML and XHTML document types, and switch error handlers (and display engines) accordingly. New pages using this standard would be handled the new way… it was planned like that at least. As far as developers are concerned, if you use XHTML, it is expected that you’re aware of the rules you are supposed to follow, and which would be enforced if you use an XML parser and/or the right MIME type.

    Yes, there is nothing stopping you to use HTML as strictly as XHTML, with the same conventions and everything. But why use HTML 4 DTD and XHTML 1.0 syntax, and not validate? Where’s the benefit in THAT? Why not simply stick to XHTML 1.0 syntax, and use XHTML 1.0 DTD? The code would be the same, the display will be the same, XML parsers will read it… AND it will validate.

  • http://www.cemerson.co.uk Stormrider

    I didn’t say the pages wouldn’t validate, I just said you could not validate the page, using a validator. I thought that is what you meant. Of course the pages should validate if checked.

    All the stuff about error checking and the MIME type is true… except you can’t use the MIME type, because it breaks IE. So that whole argument is useless really.

  • Anonymous

    In response to those who disagree that parseFromString(html, ‘text/xml’) works with self-closing tags, you just have to add one line before parseFromString:

    html = html.replace(/(<img .*?|<link .*?)>/g, ‘$1/>’);

    Any other self-closing tags? Add ‘em in. Put the above line in your library of reusable functions and you can forget it even exists. No extra work.

    Be resourceful, JavaScript is flexible. If something doesn’t work because you’re trying to shoehorn the wrong data into it, adapt or convert the data first. Anyone could write a 3-line parse(html) function which would work 100% of the time and return a DOM document for valid HTML or XHTML, and this “argument” would be over.

    If you want, you could even adapt the regex to allow for INvalid HTML as well. Then you wouldn’t have to worry whether the source you’re pulling from cares as much about validation as you do. Robustness is key, what happens if you’re pulling data from a site that’s not yours? Going to force them to convert everything to XML?

    “Considered harmful” articles, and most of the comments that typically ensue, discourage and demotivate people from finding actual solutions, or even believing those solutions exist.

  • http://www.brothercake.com/ brothercake

    It was just an example to springboard a discussion; I never meant it as the be-all and end-all reason why XHTML is useful, just one example. And as always, the title was deliberately contentious in order to draw people in (which does of course means it’s rather innacurate, but that doesn’t matter – it’s just a title)

    But thanks Anonymous, that’s a good idea.

    If this was me commenting on someone else’s post, I’d probably say “Good thing I didn’t mention the dirty knife”. As it is, I’ll just sit back and let it carry on. See, I’m not married to my opinions, and I’m perfectly willing to change them, as soon as I see something that convinces me. hypothetical arguments don’t; practical arguments do, so in that sense, what Jonny Axelsson said is the closest thing I’ve seen to convincing me that HTML 5 is A Good Thing.

  • http://www.lunadesign.org awasson

    Wow… I’ve been happily working away at my xhtml/css and didn’t realize there was so much resentment from the html 4 (or non xhtml) camp.

    Ummmm….. I’m sorry….

    I switched to xhtml because I find xhtml markup much more elegant than what came before it and regardless what the browser can or can’t do to correct crap markup, I do care whether my work validates.

    When it comes right down to it, I am a validation Nazi. I like XML and I do see a point in being able to write a parser without having to parse regular expressions first. If I write an XML parser there is an amount of server overhead to consider. Add a regular expression object into the mix and the overhead goes up (or way up) depending on the backend. I have parsed xml on the client but I prefer to do it on the backend because there are less problems to consider, like browser capabilities.

    Also, I find it easier to write well formed and easily maintainable xhtml/css than with any of the html variants. I’m sorry xhtml isn’t going to continue to evolve. I’m mystified at some of the arguments that seem to say that validating code or following the standards is too difficult.

  • http://www.reich-consulting.net/userproof/ coffee_ninja

    I try not to take up a dogmatic or religious position when it comes to any web development or coding issue. Clinging too steadfastly to any particular methodology is a good way to guarantee your irrelevance when technology changes.

    Having said that, personally I always code in XHTML simply because it feels more predictable and it’s strict structure prevents mistakes (or at least notifies me when I make them). If I have a strange rendering issue I run my markup through the W3C Validator, and chances are good that I missed a closing tag somewhere.

  • http://stommepoes.nl Stomme poes

    I’m one of those who learned on XHTML and then “downgraded” to HTML4.01 Strict. I don’t write XML parsers. I don’t write scrapers in Javascript. I’m not unhappy that someone else has a bit of trouble scraping my sites (I don’t think they would, really). I don’t see the point. Woulda, coulda, shoulda. So long as I have to deal with the garbage getting pumped out of Redmond, I’m sticking with HTML4. Similarly, I’m not going to Javascript my way into getting Redmond’s garbage to work with HTML5.

    It’s the future already, where’s my flying car??

  • AndrewCooper

    I won’t say anything on the XML / DOM Parsing front because I don’t know anything about it =/.

    What I will say though which is fairly stupid and silly is about the title of the article. When I seen “HTML 4 Considered Harmful” I thought to myself “Wow! HTML4 harmful? Whats wrong with it? I hope my Web pages are all safe and secure! Oh god what is it?!”

    I read the article and there isn’t anything wrong with HTML4 at all. It’s just preferences and shorter coding. So? In what way is that harmful? Does it open up a security loop-hole? JavaScript with HTML4 may be harmful, yes. But HTML4 on it’s own is certainly not and that is what the title of this article implied.

    Please, don’t worry me like that again. ¬_¬

    Andrew Cooper

  • yukster

    @Andrew Cooper

    The name is a play on the seminal computer science paper “Go To Statement Considered Harmful” by Edsger Dijkstra [1]… and many, many other papers playing off that title since. The point of that wasn’t a security error, or some failure that was going to happen because of using the functionality under debate. The point was that the goto statement led to all sorts of bad programming decisions and produced unmaintainable code.

    With that in mind, the title of the post is completely appropriate: the early versions of HTML produced all sorts of inconsistent and sloppy coding. The creation of XHTML was an attempt to bring the insanity under control, establish some rules, and open the door for moving away from the HTML mistake into the wide-open vista of XML. *That* is, I think, what the people saying “xhtml just feels better” are really feeling.

    This brings up an interesting side point that I’ve wondered about off and on: maybe the XML/HTML split tends to fall along programmer/non-programmer lines? I dunno. I’m a programmer and I think the derailment of the XML dream was one of the greatest tragedies of computer history.

    [1] http://en.wikipedia.org/wiki/Considered_harmful

  • Anonymous

    @Andrew Cooper

    The name is a play on the seminal computer science paper “Go To Statement Considered Harmful” by Edsger Dijkstra [1]… and many, many other papers playing off that title since. The point of that wasn’t a security error, or some failure that was going to happen because of using the functionality under debate. The point was that the goto statement led to all sorts of bad programming decisions and produced unmaintainable code.

    With that in mind, the title of the post is completely appropriate: the early versions of HTML produced all sorts of inconsistent and sloppy coding. The creation of XHTML was an attempt to bring the insanity under control, establish some rules, and open the door for moving away from the HTML mistake into the wide-open vista of XML. *That* is, I think, what the people saying “xhtml just feels better” are really feeling.

    This brings up an interesting side point that I’ve wondered about off and on: maybe the XML/HTML split tends to fall along programmer/non-programmer lines? I dunno. I’m a programmer and I think the derailment of the XML dream was one of the greatest tragedies of computer history.

    [1] http://en.wikipedia.org/wiki/Considered_harmful

  • http://www.optimalworks.net/ Craig Buckler

    @awasson

    I’m sorry xhtml isn’t going to continue to evolve…

    Don’t worry — it is. XHTML5 is an XML serialization of HTML5 correctly served with the application/xhtml+xml MIME type.

    Also, XML notation (well-formed, lower case tags, closing brackets, etc.) is valid in HTML5 served with the text/html MIME type.

    XHTML is not dead. It will simply evolve along the same path as HTML5 rather than being a separate specification.

    @brothercake
    Nice to see you using a “deliberately contentious” title that is not backed up with hard data! Now what would you have said had I done the same thing?!… ;^)

  • http://simon.html5.org/ zcorpan

    AutisticCuckoo said:

    why not lobby the browser vendors to provide an HTML-to-DOM API?

    I think some browsers are already adding support for text/html to DOMParser and XMLSerializer. Someone still needs to write a specification for them, though.

  • http://simon.html5.org/ zcorpan

    BTW, you can parse an HTML string with innerHTML, which works in IE too.

    var div=document.createElement('div');
    div.innerHTML=htmlstring;
    var dom=div.firstChild;

  • Ryan

    @awasson

    What do you find more elegant about XHTML compared to HTML? The only real difference in syntax is the slash on self closing tags required by XHTML.

    And what do you mean by “I find it easier to write well formed and easily maintainable xhtml/css than with any of the html variants”? That makes no sense, as I point out below the differences in syntax and form are absolutely minimal so any claims that XHTML is superior on this count is nonsense.

    @coffee_ninja

    How does XHTML’s strict structure prevents mistakes? Unless you’re serving it as application/xhtml-xml keeping the markup well formed and clean is entirely optional and it wont give you feedback. Of course if you do service it as xml then you’ll be preventing any IE visitors from viewing the site.

  • http://www.lunadesign.org awasson

    @Craig Buckler

    Don’t worry — it is. XHTML5 is an XML serialization of HTML5 correctly served with the application/xhtml+xml MIME type.

    Also, XML notation (well-formed, lower case tags, closing brackets, etc.) is valid in HTML5 served with the text/html MIME type.

    Thanks for pointing that out Craig… I had looked on W3C a while back after HTML5 was announced as the direction for the next standard but didn’t see anything about it until I searched XHTML5. More importantly we can still strive for well-formed markup in HTML5 and XHTML5.

  • http://ryanroberts.co.uk RyanR

    You can strive for well formed markup in HTML 4.01, I have absolutely no problem doing this.

  • http://www.lunadesign.org awasson

    @Ryan

    What do you find more elegant about XHTML compared to HTML? The only real difference in syntax is the slash on self closing tags required by XHTML.

    And what do you mean by “I find it easier to write well formed and easily maintainable xhtml/css than with any of the html variants”? That makes no sense, as I point out below the differences in syntax and form are absolutely minimal so any claims that XHTML is superior on this count is nonsense.

    The beauty of my statement is that it’s my opinion. You don’t have to agree with me but saying it’s nonsense is…. Well, nonsense.

    I suppose it’s the sheer volume of non-xhtml sites I’ve looked at that has influenced my opinion. I look at the markup of pretty much every interesting site I come across and I find a lot of tag soup, inline presentational markup and garbage in non-xhtml sites. Often they don’t even approach validation and often again they don’t look consistent across browsers & platforms.

    Furthermore, I do find it easier to write well formed and easily maintainable xhtml/css than HTML4 (or 3.2, etc…). I could write HTML4 all lower case and close all of my tags but what’s the point? It’s an old standard and the time to move on was six or seven years ago. When HTML5 becomes the standard, will you still cling to your HTML4?

  • Dave Keays

    @Ryan, The difference in my mind is consistency and the ability to spot errors before they happen. Hungarian Notation helped in the same way and VB’s Option Explicit has the same goals (but achieves them in a different manner).

    @everybody else, there was some talk about using CSS to avoid using presentation tags. But CSS can result in code that is just as convoluted and it can negate all the principles it tries to achieve (sans CSS/HTML/JS hacks). For example: Positioning text depends on the container and not the text itself. DIVs and SPANs need to be wrapped in the same manner as TABLEs, TRs, and TDs. Therefore nothing is gained when the ability to do a job is lost.

  • Ben Munat

    The name is a play on the seminal computer science paper “Go To
    Statement Considered Harmful” by Edsger Dijkstra [1]… and many,
    many other papers playing off that title since. The point of that
    wasn’t a security error, or some failure that was going to happen
    because of using the functionality under debate. The point was
    that the goto statement led to all sorts of bad programming
    decisions and produced unmaintainable code.

    With that in mind, the title of the post is completely
    appropriate: the early versions of HTML produced all sorts of
    inconsistent and sloppy coding. The creation of XHTML was an
    attempt to bring the insanity under control, establish some
    rules, and open the door for moving away from the HTML mistake
    into the wide-open vista of XML. *That* is, I think, what the
    people saying “xhtml just feels better” are really feeling.

    This brings up an interesting side point that I’ve wondered about
    off and on: maybe the XML/HTML split tends to fall along
    programmer/non-programmer lines? I dunno. I’m a programmer and I
    think the derailment of the XML dream was one of the greatest
    tragedies of computer history.

    [1] http://en.wikipedia.org/wiki/Considered_harmful

  • http://www.brothercake.com/ brothercake

    @zcorpan – although you can get a DOM that way, you can’t get a #document, and that’s what I needed.

    I’m getting some good ideas for solutions to my original problems though – thanks :) But none of it changes my mind, because it was just one example.

    The gist of the article is that one reason why pretend-XHTML is a Good Thing™ is because it can be parsed as XML by any XML parser. And as Richard Conyard indirectly pointed out, in most environments there’s absolutely no difference between “real” and “pretend” XHTML – it’s just text with a bunch of delimiters

  • http://www.brothercake.com/ brothercake

    @Craig … er, well, hmm … hey look, a squirrel!

  • Zapf Dingbat

    What is the cat’s role in this allegory? Is it the HTML 4, preparing to shed black fur on the clean white fabric of semantic mark-up?? Or does it represent the message your site was designed to convey, in danger of someone shutting the door and turning on the dryer?

  • Dave Keays

    Why is HTML called sloppy? IIRC, it was very compliant with the standards it was built on– SGML. The problem is that people decided to change the rules in the middle of the game. Blame the XML-heads for the current confusion and the need for something between SGML and XML.

  • Gordon French

    What affect will this have on CSS? I would like to explain more about how HTML 4 and CSS interact.
    CSS Tutorials

  • http://www.lunadesign.org awasson

    Dave Keays:

    Why is HTML called sloppy? IIRC, it was very compliant with the standards it was built on– SGML. The problem is that people decided to change the rules in the middle of the game. Blame the XML-heads for the current confusion and the need for something between SGML and XML.

    Actually, I would say HTML is called sloppy as a result of lax browser implementation and wysiwyg dependence.

    Why blame the xml-heads for the trouble? I think that’s an over-generalization or perhaps you could define an xml-head. It’s obviously as a result of politics and bickering in W3 and who says it’s a mess…. In a short time HTML5 will be the standard and we’ll all quietly get on with it : )

  • http://icoland.com/ glenngould

    What a misleading title!

    But nevermind everyone is already using XHTML from professionals to newbies.

    Come on cool XHTML guys at least leave us alone.

    Now let me close this comment :D />

  • Stevie D

    @awasson:

    Actually, I would say HTML is called sloppy as a result of lax browser implementation and wysiwyg dependence.

    And in what way does XHTML address that? XHTML 1.0 Transitional allows the same cruft and rubbish that HTML 4.01 Transitional does. I have seen plenty of sites that claim to use XHTML and have been generated by WYSIWYG editors and are full of tag soup and unsemantic trash, and are riddled with errors.

  • http://ryanroberts.co.uk RyanR

    @awasson

    The beauty of my statement is that it’s my opinion. You don’t have to agree with me but saying it’s nonsense is…. Well, nonsense.

    You find XHTML easier yet the only real difference is it requires one additional character (the slash in a self closing element) and a different doctype… that makes no sense. There is nothing to make XHTML easier for you, I swap between HTML 4.01 and XHTML 1.0 on a daily basis depending on the project and neither are easier or more difficult than the other.

    I suppose it’s the sheer volume of non-xhtml sites I’ve looked at that has influenced my opinion. I look at the markup of pretty much every interesting site I come across and I find a lot of tag soup, inline presentational markup and garbage in non-xhtml sites. Often they don’t even approach validation and often again they don’t look consistent across browsers & platforms.

    The exact same can be said about many XHTML sites. XHTML does nothing to prevent any of these complaints of yours unless you serve it as xml and we all know what that means (No IE support if you don’t).

    Furthermore, I do find it easier to write well formed and easily maintainable xhtml/css than HTML4 (or 3.2, etc…). I could write HTML4 all lower case and close all of my tags but what’s the point? It’s an old standard and the time to move on was six or seven years ago.

    It’s an old yet very much solid standard and as I pointed out the differences between HTML 4.01 and XHTML 1.0 are absolutely minimal. I could ask what’s the point in writing all your XHTML in lowercase and closing tags since it wont make the slightest bit of difference in the browser? The point is exactly the same as doing it with HTML.

    When HTML5 becomes the standard, will you still cling to your HTML4?

    I quite happily use HTML 4.01 (strict), XHTML 1.0 (strict) and HTML 5 at the moment. Maybe you should rethink your fanboy-like attachment to XHTML.

  • http://www.lunadesign.org awasson

    @Stevie D:

    And in what way does XHTML address that? XHTML 1.0 Transitional allows the same cruft and rubbish that HTML 4.01 Transitional does. I have seen plenty of sites that claim to use XHTML and have been generated by WYSIWYG editors and are full of tag soup and unsemantic trash, and are riddled with errors.

    Perhaps there are lots of xhtml sites that are built with wysiwyg programs but I doubt that they re full of tag soup and unsemantic garbage… xhtml as a standard was a step away from a markup that promoted that type of behaviour and if you were to produce a page in a wysiwyg program, you would have to go into code view and add all the garbage that you claim exists.

    As I mentioned earlier, I have observed more garbage in html4 marked up pages. That’s my observation. It’s an earlier standard and because it had a lower barrier to entry, there are many sites created using it as a standard. Have you seriously observed differently?

    I quite happily use HTML 4.01 (strict), XHTML 1.0 (strict) and HTML 5 at the moment. Maybe you should rethink your fanboy-like attachment to XHTML.

    Oh Ryan…. How long ago was HTML4.01? Oh that’s right 1999 and then updated in 2001. Maybe you should rethink your fanboy-like attachment to an old standard.

    I’ve already mentioned, “In a short time HTML5 will be the standard and we’ll all quietly get on with it.”

  • http://ryanroberts.co.uk RyanR

    Oh Ryan…. How long ago was HTML4.01? Oh that’s right 1999 and then updated in 2001.

    XHTML became a recommendation in 2000 a whole one year of difference. It was a reformulation of HTML 4 as XML 1.0, there are no differences in the available elements or semantics.

    Maybe you should rethink your fanboy-like attachment to an old standard.

    What fanboyism? Please read my comment again and take note of what I say regarding XHTML (twice in fact).

  • http://www.lunadesign.org awasson

    Ryan,
    Don’t you think this is getting just a little silly?

    I mentioned that I’m disappointed that xhtml wasn’t more widely adopted but I’ll move on to the next standard. Apparently as a result, (in your words) I have a “fanboy-like attachment to XHTML“.

    I wouldn’t go that far but ok, if I have a fanboy attachment to xhtml, you certainly have a fanboy attachment to html4.

    I’ve never been called a fanboy… I feel like I need to go out and get a poster for my room or something : )

  • http://ryanroberts.co.uk RyanR

    My comment about your “fanboy-like attachment” was in regard to your one sided, mistaken statements about XHTML while mocking HTML and the use of it as inferior.

    I’ve simply stated the semantic/markup differences between the two are very much minimal. That HTML is far from inferior when compared to XHTML delivered as text/html and that I use both on a daily basis.

    So much for being an fanboy :/

  • http://www.brothercake.com/ brothercake

    @Dave Keays – but you’re thinking of it in terms of, the benefit of removing presentational markup being a reduction on code size. But that’s not it at all (evidently!)

    The benefit of removing presentational markup is that your markup has better semantics – you’re left with tags that have mode-independent meaning, rather than tags which only mean something visual.

    Granted, you do end up then with a lot more wrapper elements like DIV and SPAN that you didn’t have before (although that’s only true because we’re still stuck with CSS1 for the most part – given ubiquitous CSS2 and CSS3 support, with a greater range of display styles, the need for wrapper elements diminishes and eventually disappears), but at least those elements are semantically neutral, and no semantics is always better than the wrong semantics.

  • Stevie D

    @awasson:

    Perhaps there are lots of xhtml sites that are built with wysiwyg programs but I doubt that they re full of tag soup and unsemantic garbage… xhtml as a standard was a step away from a markup that promoted that type of behaviour and if you were to produce a page in a wysiwyg program, you would have to go into code view and add all the garbage that you claim exists.

    As I mentioned earlier, I have observed more garbage in html4 marked up pages. That’s my observation. It’s an earlier standard and because it had a lower barrier to entry, there are many sites created using it as a standard. Have you seriously observed differently?

    I have to use a CMS (Sitekit) for one website, on a template that other people have set up. There are navigation lists set up as pipe-separated inline links, there are layout tables, there are inline styles and deprecated elements and attributes, and there are dozens of validation errors – yet the site purports to be XHTML. It is not alone in this. Just because an editor claims to be able to output XHTML doesn’t mean it will do it properly! The supposed strictness of XHTML hasn’t happened, because it has to be served to the browsers as HTML (because of IE), so one of the putative advantages is lost. (I’m not actually sure that this would be the right way to go anyway)

    Yes, the quality of HTML4 pages is probably worse, on average, than that of XHTML pages. The reason for that is that most dodgy editors default to HTML4, whereas generally people who use XHTML are more likely to be web-savvy and interested in standards. It’s not a binary relationship, though, and there are plenty of spot-on sites written in HTML and cruddy ones written in XHTML. Either way, the choice of language used does nothing to enforce standards or quality.