A Minimal HTML Document (HTML5 Edition)

Back in 2008, I posted a detailed breakdown of the set of tags you should include at the bare minimum in every HTML document. As you can see, there was a lot to take in at the time:

These days, the needs of accessibility, search engine optimization, document consistency for JavaScript manipulation, and support for international characters all combine to require more of our HTML. The minimal HTML document has gotten a lot bigger.

Since then, the HTML5 working group has put a lot of thought into slimming down that minimal set of tags. It turns out all major browsers agree on several shortcuts that can cut down on the code, and the HTML5 specification now allows for these shortcuts to be used in valid code.

Because all browsers (even old ones like IE6) fully support the shortcuts that are being standardized in HTML5, we can use them today; this is despite most new features of HTML5 remaining off-limits until the browsers catch up.

With those shortcuts in play, here’s the very minimum that an HTML document should now contain, assuming it has CSS and JavaScript linked to it:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>title</title>
    <link rel="stylesheet" href="style.css">
    <script src="script.js"></script>
  </head>
  <body>
    <!-- page content -->
  </body>
</html>

Read on below for a full description of each line of this minimal document.

The Breakdown

Every HTML document should start with a <!DOCTYPE> declaration that tells the browser what version of HTML the document is written in. The <!DOCTYPE> required by HTML5 documents is much shorter than those that came before:

<!DOCTYPE html>

Like all these shortcuts, this code has been specifically designed to “fool” current browsers (that are yet to support HTML5) into treating the document as a full-blooded HTML4 document. Browser versions as far back as Internet Explorer 6 will render the page with their most standards-compliant rendering mode.

Next, we mark the start of the document with the opening <html> tag. This tag should should specify the primary language for the document’s content, with the lang attribute:

<html lang="en">

Next comes the <head> tag, which starts the document header:

  <head>

The first bit in the header should be a <meta> tag that specifies the character encoding of the page. Usually, the character encoding is declared by the web server that sends the page to the browser, but many servers are not configured to send this information. Specifying it here ensures the document is displayed correctly even when it’s loaded directly from disk, without consulting a server.

Once again, HTML5 significantly shortens this tag compared to its HTML4 equivalent, but, as before, this shortcut takes advantage of the existing error-handling behavior of all current browsers, so is safe to use today:

    <meta charset="utf-8">

With the encoding established, we can safely write the first piece of actual content in the page—the page title:

    <title>title</title>

If you want to link a CSS file to the page to control its appearance (which you usually will), a <link> tag at this point will do the trick:

    <link rel="stylesheet" href="style.css">

The type="text/css" attribute that was required in HTML4 is now optional in HTML5, and all current browsers know what to do if you leave the attribute out.

If you want to link a JavaScript script to the page, and the script is designed to be invoked from the header, insert a <script> tag at this point. Unlike the <link> tag, the <script> must be paired with a full </script> closing tag:

    <script src="script.js"></script>

The type="text/javascript" attribute, once again, is now optional in HTML5, and all current browsers behave correctly when you leave it out.

That just about does it. You can end the header, then start the body of the page with a <body> tag. The content of the page is up to you, but since we’re talking about a minimal document, there need not be any body content at all:

  </head>
  <body>
    <!-- page content -->
  </body>
</html>

How’s that look to you? Any surprises?

If you’re like me, some of the shortcuts presented here make you feel a little uneasy at first blush. Is it really safe to use an HTML5 <!DOCTYPE> declaration when current browsers are yet to support most of HTML5?

Strange as it may seem to adopt code from an as-yet-unsupported specification, HTML5 was designed to be adopted in exactly this way. Shortcuts like those presented above are not features of a new standard, but a more efficient use of the HTML parsing features that have already been built into browsers for years.

Now that the W3C HTML Validator supports HTML5, it will validate documents that contain these shortcuts; there really is no reason to do it the long way anymore.

And if you enjoyed reading this post, you’ll love Learnable; the place to learn fresh skills and techniques from the masters. Members get instant access to all of SitePoint’s ebooks and interactive online courses, like HTML5 & CSS3 For the Real World.

Comments on this article are closed. Have a question about HTML5? Why not ask it on our forums?

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • http://www.brothercake.com/ brothercake

    Hmm, the thing that bothers me is that, just like HTML5, XHTML1 was “designed to be adopted in exactly this way” – ie. to take advantage of error-handling to implement unsupported syntax.

    And yet, many of the purists who are now advocating HTML5, are saying that this was a bad thing – that relying on broken parsing models to produce a document is no better than producing invalid code. And yet HTML5 now works on the exactly the same principle.

    I wonder whether, in a few years time, the same people who advocate this will be decrying it.

    I’m going to stick to XHTML until HTML5 is implemented for real. We have no idea how the specification will change between now and then, and can’t have any real confidence that anything you’ve said here today will continue to be true, however reasonable it seems now.

    And even then, I’m not going to let my markup suffer with HTML4 syntax that fails the most basic formedness test and fundamentally undermines the development of the semantic web. Frankly I’m a little suprised that you are.

    But I won’t second-guess — you tell me — why have you abandoned XHTML-style self-closing syntax, and gone back to what you’re demonstrating here?

    • http://fvsch.com Florent V.

      “just like HTML5, XHTML1 was ‘designed to be adopted in exactly this way’ – ie. to take advantage of error-handling to implement unsupported syntax.”

      How so? What do you mean by “unsupported syntax” (not supported by the spec, by the browsers…)? It seems to me that XHTML1 was a pretty straightforward adaptation of HTML4 to a XML syntax, and that unlike HTML5 it was not designed for backwards compatibility, as serving it as “application/xhtml+xml” or using a XML declaration and a few other things would result in messy browser behavior.

      XHTML1 was designed as something “pure”, and then they added Appendix C to say, “the pure approach won’t work in browsers, so you may want to bastardize your XHTML1 documents in such or such way, and we’ll just decide it’s okay to do so.”
      http://www.w3.org/TR/xhtml1/#guidelines

      HTML5 is designed as backwards-compatible from the start, not in an appendix. Only new features, not adaptations of HTML4 features, may break in browsers.

      Oh, and when you serve XHTML1 documents as text/html, the slash at the end of empty tags (like in <img />) is seen as an error by the HTML parser, but that error is ignored. Relying on permissive parsing for <meta charset=”….”> for instance is not worse than that. ;)

      • http://www.lunadesign.org awasson

        Actually the pure approach worked in browsers… Unfortunately some browsers decided not to support the spec and unfortunately one particular brand of browser that didn’t support it had deep market penetration. The spec was fine, it just wasn’t implemented.

        I’m with Brothercake with this. At least until HTML5 gets out of draft and becomes a spec, I won’t be going back to HTML98.

  • derekerdmann

    Is there any notable performance consequences of not using the “type” attributes in script and link tags? I would assume the browser would need to figure out what type the file is before parsing it.

  • mrhoo

    I’ve used the techniques described here for a year or more, with no ill effects, on several sites.

    html5 allows empty tags (link,img,input, etc) to be ‘self-closed’ or not.

    I never do, but I always used html as html, and never felt the need to pretend that it was really xml.

    On the other hand, if an element end tag is optional I always include it,
    and I always quote attribute values, even if they don’t need the quotes.

    html5 served as text/html allows you to be about as verbose or as terse as you are comfortable with.

  • guilherme

    I published my site recently and was wondering if I should be “cutting-edge” and code it in HTML5 or in XHTML 1.0 strict, like I always do.
    Further research showed that I would gain nothing from coding it in HTML5, and then I ended up using XHTML.
    I don’t know, HTML5 just doesn’t seem solid enough yet. Of course there’s an amount of “fooling” the broser in XHTML too, but these shortcuts don’t make me want to switch, nor the semantic tags, which are also debatable…
    There are cool things like the video and audio support and the canvas tag, but HTML5 still hasn’t convinced me. And I agree with brothercake that the XML syntax is more organized, that’s why I began using XHTML in the first place.

  • http://www.tyssendesign.com.au Tyssen

    Is it really safe to use an HTML5 declaration when current browsers are yet to support most of HTML5?

    I’ve been using it with no problems for a while now. And just in the last month or two I’ve been removing the type attributes too with no problem.

  • http://www.yellowshoe.com.au/ markbrown4

    I didn’t think the was understood by IE correctly so I was sticking to the extended version.
    I too like the XHTML style self-closing It makes sense and keeps things consistent.

  • http://simon.html5.org/ zcorpan

    brothercake, the difference is that XHTML1 didn’t define the error handling that it relied on, which HTML5 does.

  • cleverclick

    I think we are still at testing mode as none of the above standards are complete. We need to test and try new things in order to get some better standards and more developers to follow the standards.

    Learning by doing is a very good option in our business…

  • WebKarnage

    I haven’t coded one page as HTML 5 yet, as I know I think XHTML. The closing of tags always made sense as opposed to closing some and not others. Any clear consistent logic you can stick to I only see as good.

    These basic document parts are fine, and maybe I should go this way, but when what I’m doing is XHTML 1 strict and not HTML 5 in really any way other than where the 2 cross, I don’t get the point. When I feel I’m doing something that actually is HTML 5 and not XHTML then the doctype will make sense.

    Maybe I don’t get the whole point. Seems like a dumb argument for me right now.

    with best regards,
    Karn.

    • http://fvsch.com Florent V.

      You could say exactly the same thing about XHTML 1.0. XHTML 1.0 was created as a XML-capable version of HTML 4.01. The point of making it XML was to be able to embed content described in other specs in your HTML-like documents. So SVG and MathML and other stuff in your XHTML. The XHTML 1.0 spec is called “XHTML 1.0 The Extensible HyperText Markup Language”. Note the EXTENSIBLE part in the name, that’s the X in XHTML; yet virtually nobody is using the XML extensibility that was the whole point of creating XHTML. The lowercase tags and attribute quotes and self-closing tags were syntactic sugar, nothing more.

      So if you’re not embedding MathML or SVG or other XML-based stuff in your XHTML 1.0 documents, all you’re doing is glorified HTML 4.01. You could do that glorified HTML4 with a HTML5 doctype and some HTML5 syntactic sugar, it would be exactly the same as what you are doing with XHTML1. :)

  • Tatsh

    Does HTML5 specification in terms of interpreting rules say anything about where <script> should go? I refuse to place scripts in the header due to the way Firefox handles them (tries to execute all the script while loading the page if it’s in the header, which is particularly useless if you depend on an element existing that you are not rendering dynamically with JavaScript).

    I would suggest putting scripts in the bottom of the body (very last item(s)). Speed, and you don’t have to worry about elements existing or not. You can avoid those checks essentially. You can avoid the check to make sure it’s safe to execute scripts yet (which is almost the same thing).

    HTML5 should say something along the lines of: if script is in the header, execute it last. And all developers should be aware of this behaviour of course. For me it causes no problems as I already do this manually. But many developers are used to event handling with onload and etc (i.e. checking for a ‘safe window’ before executing critical code).

  • http://fvsch.com Florent V.

    HTML4 has the defer attribute. With <script defer src=”…”> you can tell the browser to only execute the script once it is done parsing the document (so after it has set up the initial DOM). This is currently implemented in all browsers, and has been in IE since version 4. Since the spec was not very precise and there was not a strong focus on client-side performance in the past, implementations might not be totally reliable, but you should look it up and get an idea of whether using the defer attribute is enough or if you need to put your script tags at the very end of the document for maximum compatibility.
    Now for HTML5: it still has the defer attribute. If anything, the spec is more precise on how script elements should be handled (a big focus of HTML5 is reducing discrepancies in how HTML is understood by user agents). Then it adds the async attribute, to tell the browser to execute the script immediately without blocking content rendering (which they tend to do right now when you call a script from the document’s HEAD).
    What i understand is that for a typical website, you would call all your libraries that don’t interact with the page, but still need to be executed (this consumes some processing time), with <script async src=”mylib.js”>. And you would call all your scripts that do something on the page with <script defer src=”dosomething.js”>. All of this in the document’s HEAD, while getting maximum performance. But until all browsers have good support for the async attribute, it might be better to defer everything, or to put everything at the end of the document.

  • http://www.lunadesign.org awasson

    Kevin,
    Maybe you can answer a question that has been bugging me since I first saw these minimal tags. … Older browsers will think the document is HTML4 and render the document to the best of their abilities and modern browsers will recognize it is an HTML5 document and work their magic but is this it. IS HTML5 the final hypertext markup language for all time?

    The reason for my concern is that if there is an HTML5.2 or HTML5.5 how will the browser discern between versions when the doctype has no remarkable features?

    Cheers,
    Andrew

    • http://www.lunadesign.org awasson

      Ok, so does anyone have an answer to the Doctype question I posted above last week?
      With the minimal tag, HTML5 has abandoned the SGML based Document Type Declaration which provides a reference to the version on HTML that the document conforms to. I have a couple of concerns with this:

      1) How do you future proof this so that when the next standard comes out, browsers know that it is HTML5 or HTML6 (or whatever the next one will be). How will this impact our ability to produce websites that act the way we want them to (consistently).

      2) Is it less flexible not being able to reference the DTD? We used to be able to write our own Doctype, base it on whatever we wanted to and then add functionality that wasn’t available. For instance we could use Strict which doesn’t allow Target as our base and then add Target to our custom DTD so that the document was still valid Strict but had some additional functions outside of the Strict feature set.

      I’m not an HTML purist and I think we’ll benefit from some pretty cool additions with HTML5 but this is a fundamental change in the way rendering engines interact with HTML documents so there must have been some serious debate amongst the HTML5 group about the ramifications. I’m surprised that there is really no information or discussion about this change.

      The information I have read states that: HTML 5 is not based on SGML and therefore does not require a reference to a DTD. Ok… but what about the future and the next version? Is HTML5 the last HTML specification?

      Does anyone have further information about this?

      • http://fvsch.com Florent V.

        Andrew, i believe the original intent was to drop the doctype altogether, as user agents (browsers) rendering HTML have never used the version number in a doctype to alter the way they render a HTML document, and browser vendors apparently don’t plan to do this ever. Of course in the past (and it still holds today) browsers have used doctype switching to preserve backwards compatibility. But they didn’t do so to preserve compatibility with older versions of HTML… they wanted to be compatible with pages that relied on wrong implementations of HTML and CSS.

        The only reason HTML5 has a doctype is because browser vendors are not ready to drop Quirks Mode, and there are browser versions out there that still have a Quirks Mode of sorts and rely on doctype switching to activate it or not. I think IE9 will still have Quirks Mode (rendering that mimicks IE5.5). When the oldest version of IE out there is IE11, my guess is that we will all be able to drop the doctypes from our HTML5/6 documents, if we still do HTML.

        XHTML5 has no doctype.

        “We used to be able to write our own Doctype, base it on whatever we wanted to and then add functionality that wasn’t available.”

        You can still do that if you want to. Especially for XML dialects of your own (if you don’t go with schema instead). But for HTML rendering, browsers never ever downloaded the DTD and used it in any way. Their HTML parser is built to deal with “HTML”, and that’s it. By the way, Firefox 4 will use a new, “HTML5” parser for all HTML documents. It’s already available in Firefox 3.6, but is not the default parser (you have to activate it in about:config). This new parser is dubbed “HTML5” because it enables support for some HTML5 features and, most importantly, it implements the error-handling algorithms in the HTML5 spec, which were designed to be compatible with HTML4 docs and find a common ground between the different error-handling rules of the existing HTML parsers (it mostly takes from Trident and Webkit).

        “For instance we could use Strict which doesn’t allow Target as our base and then add Target to our custom DTD so that the document was still valid Strict but had some additional functions outside of the Strict feature set.”

        I fail to see what the point of this would be. If you need HTML4 Transitional features, you can either:
        - declare HTML4 Transitional, and it will be valid;
        - or declare HTML4 Strict and you will have validation errors but who cares?
        In reply to that “who cares”: the HTML parsers of user agents certainly don’t. Not one bit.

        The deal with HTML5, if you do care about validity (as a teaching tool, as an internal validation tool for your team or with your client, as a quality assurance tool with some automated testing in place, etc.), is: write valid HTML5. Is the code valid? Good. Is it invalid? Not good. Is it invalid because you used a HTML4 feature that was ruled out in HTML5 (there’s not a lot of them, mind you)? Then use HTML4, not HTML5. Are you still using this HTML4 feature not in HTML5 because you really need to, and still declaring HTML5 overall because you’re using new HTML5 features and you need them too? Then there will be validation errors, but hopefully you can document them in the project’s documentation for your team, throw in some exceptions in your automated testing, etc.

        The only issue i can think of is: how will validators now that they’re dealing with HTML6 if the doctype stays the same? If there is a HTML6 some day (not sure), and it doesn’t provide a way to say in the markup that it’s HTML6 and not 5 or 4, then you will have to tell the validator yourself. I doubt this will be a big problem, though.

      • http://www.lunadesign.org awasson

        Thanks Florent,
        I appreciate your reply and I figured someone, somewhere had put some thought into it.

        I’m not so certain about browsers not altering rendering based on Doctype as I do recall a conversation about that some 6 or 8 years ago where either IE or one of the other browsers were definitely rendering differently when the doctype was changed but it could be malformed HTML that was causing it. It was just an alignment issue but it could have been an IE quirks mode thing (long time ago…).

        I still have my concerns about future proofing our documents though because if history does nothing else, it reinforces the point that change is inevitable… As long as the Internet is kicking there will be newer improved versions long after HTML5 has been declared obsolete. Hopefully it won’t leave a legacy like the IE6 debacle.

  • flashmac

    Just a question.. shouldn’t the meta tag have a closing slash?

    and not

    Call me pedantic!

    • http://www.lunadesign.org awasson

      Nope, not any more… Those were the old rules of well formed documents. The brave new HTML5 world doesn’t seem to care whether you close or leave tags open.

    • http://fvsch.com Florent V.

      In HTML5 both syntaxes are valid: <br> and <br />.
      In XHTML5 only <br /> is valid.
      Closing empty tags is the rule for well-formed XML documents. Those documents (including XHTML1 or XHTML5 documents served as “application/xhtml+xml”) must be valid XML because they are destined to XML parsers. XML parsers are simple affairs not built for dealing with faulty or ambiguous markup structure, and their focus is on handling any XML dialect you throw at them, even if it’s a custom XML syntax you created for your own use.
      Now HTML parsers are much more complex, and deal with errors and ambiguous markup and markup mistakes you couldn’t even imagine. As a result, they only deal with one dialect, HTML. For these tools, knowing that <br> is an empty element is a no-brainer. Saying that you SHOULD close empty tags in HTML for the sake of HTML parsers is like saying you should remind English linguists that A, E and I are vowels when writing to them—there’s no need, they bloodywell know already. So when awasson writes “the brave new HTML5 world doesn’t seem to care whether you close or leave tags open”, i agree, but i will add: it doesn’t care because there is very few reasons to care, and from the browser’s perspective there is absolutely no reason to care.
      But if you want a reminder for yourself or your team that BR or META or even the SOURCE elements are “empty”, you can go ahead and use <meta … /> in your HTML5 code.