A Minimal HTML Document

Tweet

I am often surprised by just how many professionally-designed sites are delivered in the form of incomplete HTML documents. To be fair, however, the amount of code required for even an empty HTML document has grown significantly over the years.

One upon a time, an HTML document only had to contain a <!DOCTYPE> declaration and a <title> tag. From the HTML 3.2 recommendation:

In practice, the HTML, HEAD and BODY start and end tags can be omitted from the markup […]

Every HTML 3.2 document must also include the descriptive title element. A minimal HTML 3.2 document thus looks like:


  <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
  <TITLE>A study of population dynamics</TITLE>

At the time HTML 3.2 was the recommended spec, very few web designers bothered with a <!DOCTYPE>, or with valid code at all, so in practice an HTML document could be any text file containing any combination of text and HTML tags.

These days, the needs of accessibility, search engine optimization, document consistency for JavaScript manipulation, and support for international characters all combine to require more of our HTML. The minimal HTML document has gotten a lot bigger.

Here’s the very minimum that an HTML 4 document should contain, assuming it has CSS and JavaScript linked to it:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
    "http://www.w3.org/TR/html4/strict.dtd">
<html lang="en">
  <head>
    <meta http-equiv="content-type" content="text/html; charset=utf-8">
    <title>title</title>
    <link rel="stylesheet" type="text/css" href="style.css">
    <script type="text/javascript" src="script.js"></script>
  </head>
  <body>
		
  </body>
</html>

If you want to be able to process your document as XML, then this minimal XHTML 1 document should be your starting point instead:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
  <head>
    <meta http-equiv="content-type" content="text/html; charset=utf-8"/>
    <title>title</title>
    <link rel="stylesheet" type="text/css" href="style.css"/>
    <script type="text/javascript" src="script.js"></script>
  </head>
  <body>
		
  </body>
</html>

Read on below for a description of each line of these minimal documents.

The Breakdown

Every (X)HTML document should start with a <!DOCTYPE> declaration that tells the browser what version of (X)HTML the document is written in. In practical terms, this tells browsers like Internet Explorer and Firefox to use their most standards-compliant (and therefore cross-browser-compatible) rendering mode. The exact form of the <!DOCTYPE> declaration depends on whether your document is HTML 4 (fine for most purposes) or XHTML 1 (enables the page to be processed as XML):

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
    "http://www.w3.org/TR/html4/strict.dtd">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

Next, we mark the start of the document with the opening <html> tag. This tag should should specify the primary language for the document’s content, with the lang attribute:

<html lang="en">

In an XHTML document, you should also specify the document’s default XML namespace (using the xmlns attribute) and re-specify the language using XML’s standard xml:lang attribute:

<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">

Next comes the <head> tag, which starts the document header:

  <head>

The first thing in the header should be a <meta> tag that specifies the character encoding of the page. Usually, the character encoding is declared by the web server that sends the page to the browser, but many servers are not configured to send this information, and specifying it here ensures the document is displayed correctly even when it is loaded directly from disk, without consulting a server:

    <meta http-equiv="content-type" content="text/html; charset=utf-8">

In an XHTML document, <meta> tags should end with a slash to indicate they are empty:

    <meta http-equiv="content-type" content="text/html; charset=utf-8"/>

With the encoding established, we can safely write the first piece of actual content in the page—the page title:

    <title>title</title>

If you want to link a CSS file to the page to control its appearance (which you usually will), a <link> tag at this point will do the trick:

    <link rel="stylesheet" type="text/css" href="style.css">

Again, the XHTML version of this tag needs a trailing slash to indicate it is empty:

    <link rel="stylesheet" type="text/css" href="style.css"/>

If you want to link a JavaScript script to the page, and the script is designed to be invoked from the header, insert a <script> tag at this point. Whether the document is HTML or XHTML, you should include a full </script> closing tag for backwards compatibility:

    <script type="text/javascript" src="script.js"></script>

That just about does it. You can end the header, then start the body of the page with a <body> tag. The content of the page is up to you, but since we’re talking about a minimal document, there need not be any body content at all:

</head>
<body>
		
</body>
</html>

So, how does your most recent work hold up? Have you included all the elements discussed above? Common omissions like the lang attribute and the content-type <meta> tag may seem unnecessary, but they really put that final layer of polish that ensure your site holds up with the best on the Web.

I’d love to hear your thoughts on the basic HTML elements discussed above, or any other elements you absolutely always include in your pages. Leave a comment and let me know!

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • http://autisticcuckoo.net/ AutisticCuckoo

    The Content-Type HTTP equivalent in your XHTML example is wrong. It states that the content type is text/html, which doesn’t allow the document to be parsed as XML. And if it’s to be parsed as XML you dont need the lang attribute; xml:lang will suffice.

    Instead, for real XHTML you should have an XML declaration at the top of the document, specifying the XML version (1.0) and the character encoding.

    Browsing through long-forgotten documents on an old laptop the other day, I found some of my old HTML stuff. They did contain a doctype declaration … for HTML 2.0! :)

  • http://www.sitepoint.com/ Kevin Yank

    Thanks for the quick feedback, Tommy!

    The Content-Type HTTP equivalent in your XHTML example is wrong. It states that the content type is text/html, which doesn’t allow the document to be parsed as XML. And if it’s to be parsed as XML you dont need the lang attribute; xml:lang will suffice.

    The intent was to build a document that was parsable as XML by a non-browser system expecting generic XML input, but which would be treated as HTML by current browsers. Such a scenario seems to be the most practical application of XHTML at the moment.

    Thus the “text/html” meta tag (which has no special meaning to a generic XML parser), and the two lang attributes.

    Instead, for real XHTML you should have an XML declaration at the top of the document, specifying the XML version (1.0) and the character encoding.

    As you know, such a declaration fouls up the box model in Internet Explorer 6. To an increasing number of sites this is no longer a concern, but for now the defaults of XML 1.0 encoded as UTF-8 work fine, so there is no pressing need to include the declaration.

    Browsing through long-forgotten documents on an old laptop the other day, I found some of my old HTML stuff. They did contain a doctype declaration … for HTML 2.0! :)

    I wish I still had documents that old. I’d love to do an article on my greatest past sins against web standards. :)

  • http://autisticcuckoo.net/ AutisticCuckoo

    As you know, such a declaration fouls up the box model in Internet Explorer 6.

    No version of IE supports XHTML anyway, and the text says ‘If you want to be able to process your document as XML’.

    I know all about the widespread propensity for using pretend-XHTML, but there’s no reason for not explaining how it should be done, right? :)

  • Ryan

    If you want to be able to process your document as XML, then this minimal XHTML 1 document should be your starting point instead:

    Except that isn’t XHTML.

  • http://www.sitepoint.com/ Kevin Yank

    AutisticCuckoo,

    No version of IE supports XHTML anyway, and the text says ‘If you want to be able to process your document as XML’.

    Right, and there’s nothing about the text/html <meta> tag that prevents you from processing your document as XML. Browsers won’t process it as XML by default, but that isn’t a typical use case in practice. You could, however, make an XMLHttpRequest call from a script elsewhere on your site to load the page, parse it as XML, and extract some information from the resulting XML DOM.

    I know all about the widespread propensity for using pretend-XHTML, but there’s no reason for not explaining how it should be done, right? :)

    I’ve never liked the term “pretend XHTML”, as it implies we are dressing up HTML as XHTML. Rather, what we have here is an XHTML document that advertises itself to browsers as HTML. As I see it, the term “pretend HTML” would be more accurate. Or maybe “XHTML in HTML’s clothing”. :)

    As for explaining how it should be done, if my intent were to assist the reader in achieving the most standards compliant markup possible, I would certainly have made room to mention application/xhtml+xml. My intent, however, was to make a considered, practical recommendation, which is only to parse XHTML as XML when making active use of its XML features.

    Ryan,

    Except that isn’t XHTML.

    It is. Under Section 5.1 of the XHTML 1.0 Recommendation, a valid XHTML document may advertise itself as text/html for compatibility reasons. This will prevent all major XHTML-capable browsers from recognizing it as XHTML, but if you’re going to be a stickler for the spec, it is still an XHTML document, and it will validate as such.

    In summary, my minimal XHTML document…

    • is valid XHTML
    • can be parsed as XML (e.g. by JavaScript code on your site that requests it with XMLHttpRequest)
    • will be treated as HTML in all current browsers
  • http://www.digitalgreenlight.com busy

    One small gripe. A minimal html document doesn’t have css and javascript in it. Those tags aren’t required.

    It would more likely have an h1 and a p before that, but then it’s still not a totally minimal document anymore.

  • arts-multimedia

    It depends what you want to do with a minimal html document. If you use it as an include with php, you can leave out everything since all the required stuff is already in the page that calls the minimal page.
    As such, a page called example.html can have content like this:

    No service before 11 am.

    Nothing else. Then you call it in your page that has all the required elements:
    and it will insert the line No service before 11 am.

    Another function for a minimal html document can be an index file in a folder of a cms where you do not want hackers to be able to read the folder content. In that situation, index.html doesn’t have to contain anything, it is just a blank page without any text or code.

    I agree, this is not exactly within the scope of this article, but I just wanted to show that there are instances in which all the required code is not needed.

  • JVLB

    Although it obviously isn’t needed for a minimal document and browsers default to CSS anyway, to dot the “i’s” and “t’s” I include <meta http-equiv="Content-Style-Type" content="text/css"> in my document template, (in the excellent Quanta Plus editor on Linux).

  • Daniel

    Would it be trollish to point out Anne’s Weblog? View source if you don’t know what I mean.

  • http://www.digitalgreenlight.com busy

    @arts-multimedia
    Neither of those are html documents.

    One is an include file that ends up as part of your html document, and the other is an empty file that happens to have the extension .html

    Boy do I feel pedantic today. Sorry guys.

  • http://www.sitepoint.com/ Kevin Yank

    Would it be trollish to point out Anne’s Weblog? View source if you don’t know what I mean.

    Anne’s code leaves out the <html>, <head>, and <body> tags, which is perfectly legal to do in an HTML 4 document, much as it was for HTML 3.2 (as mentioned at the top of my article). This “ultra-minimalism” does sacrifice a couple of benefits that I would not be comfortable with on most sites, however.

    The most important element she has left out is the lang attribute on the <html> tag, which identifies the primary language in use in the page. Search engines and other spidering systems make good use of this information.

    The fact that she has no content-type <meta> tag will cause the files that make up her site to be interpreted as Latin-1 encoded text, instead of UTF-8, when opened without a web server. But if that is not an issue for her, then she can do without it.

  • http://dyersweb.com/ dyer85

    Hi Kevin, nice article.

    I just had one concern. You specifically state in the section, The Breakdown:

    The exact form of the declaration depends on whether your document is HTML 4 (fine for most purposes) or XHTML 1 (enables the page to be processed as XML)…

    Although it may be correct XHTML 1.0 to serve as text/html, AFAIK, even browsers capable of serving pages using the XML parser will not do so unless the MIME type is set as application/xhtml+xml. If this is indeed the case, setting the DOCTYPE to use XHTML isn’t sufficient to make the browser use XML parsing.

    You might consider making a separate section for XHTML 1.1, which should always be served with application/xhtml+xml, IIRC.

  • Philippe

    Kevin Yank:
    If you had done a minimum of checking, you’d see that Anne’s site sends the content encoding in the http header (text/html;charset=utf-8).

    PS – Anne is a ‘he’ not a ‘she’.

  • http://dyersweb.com/ dyer85

    Look at what Kevin wrote more closely:

    …when opened without a web server…

    The content will not be rendered as UTF-8 without a web server, unless the browser’s default is set as such. When a server isn’t available to send the headers, the browser can only check the <meta> tag specifying Content-Type.

    Curtis

  • http://autisticcuckoo.net/ AutisticCuckoo

    I’ve never liked the term “pretend XHTML”, as it implies we are dressing up HTML as XHTML.

    That’s exactly what you’re doing if you serve it as text/html. No matter how much you wish it to be otherwise, it is HTML if you serve it as such. Invalid, ugly HTML, but still HTML.

    If you want it to be XHTML and make use of anything XHTML offers beyond what HTML can do, then you must serve it as an application of XML.

  • bart (Cityvox)

    we had some problems with
    when using script from one of our clients.
    it is safer to use a chartset over here as well

  • arts-multimedia

    I’m sometimes worried about these purists who want everybody to do exactly what the dogma says or else…
    In real life, we all use several methods to come to a good result. Being restricted to one method leads to anaemia!

    No scripting or l. Yet, there are tons of wonderful sites out there and I don’t care if they are written in ugly this or that as long as screen readers can handle them, all is well.

  • arts-multimedia

    I’m sometimes worried about these purists who want everybody to do exactly what the dogma says or else…
    In real life, we all use several methods to come to a good result. Being restricted to one method only leads to anaemia.

  • j9t
  • patspam

    Since you’re talking about best practice here, I’d rather see the js script tag at the bottom of the body. That way, if people are following unobtrusive js best practice they’ll get a better YSlow score.

    -Patrick (patspam.com)