Programming - - By Kevin Yank

A Minimal HTML Document

I am often surprised by just how many professionally-designed sites are delivered in the form of incomplete HTML documents. To be fair, however, the amount of code required for even an empty HTML document has grown significantly over the years.

One upon a time, an HTML document only had to contain a <!DOCTYPE> declaration and a <title> tag. From the HTML 3.2 recommendation:

In practice, the HTML, HEAD and BODY start and end tags can be omitted from the markup […]

Every HTML 3.2 document must also include the descriptive title element. A minimal HTML 3.2 document thus looks like:


  <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
  <TITLE>A study of population dynamics</TITLE>

At the time HTML 3.2 was the recommended spec, very few web designers bothered with a <!DOCTYPE>, or with valid code at all, so in practice an HTML document could be any text file containing any combination of text and HTML tags.

These days, the needs of accessibility, search engine optimization, document consistency for JavaScript manipulation, and support for international characters all combine to require more of our HTML. The minimal HTML document has gotten a lot bigger.

Here’s the very minimum that an HTML 4 document should contain, assuming it has CSS and JavaScript linked to it:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
    "http://www.w3.org/TR/html4/strict.dtd">
<html lang="en">
  <head>
    <meta http-equiv="content-type" content="text/html; charset=utf-8">
    <title>title</title>
    <link rel="stylesheet" type="text/css" href="style.css">
    <script type="text/javascript" src="script.js"></script>
  </head>
  <body>
		
  </body>
</html>

If you want to be able to process your document as XML, then this minimal XHTML 1 document should be your starting point instead:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
  <head>
    <meta http-equiv="content-type" content="text/html; charset=utf-8"/>
    <title>title</title>
    <link rel="stylesheet" type="text/css" href="style.css"/>
    <script type="text/javascript" src="script.js"></script>
  </head>
  <body>
		
  </body>
</html>

Read on below for a description of each line of these minimal documents.

The Breakdown

Every (X)HTML document should start with a <!DOCTYPE> declaration that tells the browser what version of (X)HTML the document is written in. In practical terms, this tells browsers like Internet Explorer and Firefox to use their most standards-compliant (and therefore cross-browser-compatible) rendering mode. The exact form of the <!DOCTYPE> declaration depends on whether your document is HTML 4 (fine for most purposes) or XHTML 1 (enables the page to be processed as XML):

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
    "http://www.w3.org/TR/html4/strict.dtd">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

Next, we mark the start of the document with the opening <html> tag. This tag should should specify the primary language for the document’s content, with the lang attribute:

<html lang="en">

In an XHTML document, you should also specify the document’s default XML namespace (using the xmlns attribute) and re-specify the language using XML’s standard xml:lang attribute:

<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">

Next comes the <head> tag, which starts the document header:

  <head>

The first thing in the header should be a <meta> tag that specifies the character encoding of the page. Usually, the character encoding is declared by the web server that sends the page to the browser, but many servers are not configured to send this information, and specifying it here ensures the document is displayed correctly even when it is loaded directly from disk, without consulting a server:

    <meta http-equiv="content-type" content="text/html; charset=utf-8">

In an XHTML document, <meta> tags should end with a slash to indicate they are empty:

    <meta http-equiv="content-type" content="text/html; charset=utf-8"/>

With the encoding established, we can safely write the first piece of actual content in the page—the page title:

    <title>title</title>

If you want to link a CSS file to the page to control its appearance (which you usually will), a <link> tag at this point will do the trick:

    <link rel="stylesheet" type="text/css" href="style.css">

Again, the XHTML version of this tag needs a trailing slash to indicate it is empty:

    <link rel="stylesheet" type="text/css" href="style.css"/>

If you want to link a JavaScript script to the page, and the script is designed to be invoked from the header, insert a <script> tag at this point. Whether the document is HTML or XHTML, you should include a full </script> closing tag for backwards compatibility:

    <script type="text/javascript" src="script.js"></script>

That just about does it. You can end the header, then start the body of the page with a <body> tag. The content of the page is up to you, but since we’re talking about a minimal document, there need not be any body content at all:

</head>
<body>
		
</body>
</html>

So, how does your most recent work hold up? Have you included all the elements discussed above? Common omissions like the lang attribute and the content-type <meta> tag may seem unnecessary, but they really put that final layer of polish that ensure your site holds up with the best on the Web.

I’d love to hear your thoughts on the basic HTML elements discussed above, or any other elements you absolutely always include in your pages. Leave a comment and let me know!

Sponsors