I am often surprised by just how many professionally-designed sites are delivered in the form of incomplete HTML documents. To be fair, however, the amount of code required for even an empty HTML document has grown significantly over the years.
One upon a time, an HTML document only had to contain a <!DOCTYPE> declaration and a <title> tag. From the HTML 3.2 recommendation:
In practice, the HTML, HEAD and BODY start and end tags can be omitted from the markup […]
Every HTML 3.2 document must also include the descriptive title element. A minimal HTML 3.2 document thus looks like:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> <TITLE>A study of population dynamics</TITLE>
At the time HTML 3.2 was the recommended spec, very few web designers bothered with a <!DOCTYPE>, or with valid code at all, so in practice an HTML document could be any text file containing any combination of text and HTML tags.
These days, the needs of accessibility, search engine optimization, document consistency for JavaScript manipulation, and support for international characters all combine to require more of our HTML. The minimal HTML document has gotten a lot bigger.
Here’s the very minimum that an HTML 4 document should contain, assuming it has CSS and JavaScript linked to it:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>title</title>
<link rel="stylesheet" type="text/css" href="style.css">
<script type="text/javascript" src="script.js"></script>
</head>
<body>
</body>
</html>
If you want to be able to process your document as XML, then this minimal XHTML 1 document should be your starting point instead:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8"/>
<title>title</title>
<link rel="stylesheet" type="text/css" href="style.css"/>
<script type="text/javascript" src="script.js"></script>
</head>
<body>
</body>
</html>
Read on below for a description of each line of these minimal documents.
The Breakdown
Every (X)HTML document should start with a <!DOCTYPE> declaration that tells the browser what version of (X)HTML the document is written in. In practical terms, this tells browsers like Internet Explorer and Firefox to use their most standards-compliant (and therefore cross-browser-compatible) rendering mode. The exact form of the <!DOCTYPE> declaration depends on whether your document is HTML 4 (fine for most purposes) or XHTML 1 (enables the page to be processed as XML):
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
Next, we mark the start of the document with the opening <html> tag. This tag should should specify the primary language for the document’s content, with the lang attribute:
<html lang="en">
In an XHTML document, you should also specify the document’s default XML namespace (using the xmlns attribute) and re-specify the language using XML’s standard xml:lang attribute:
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
Next comes the <head> tag, which starts the document header:
<head>
The first thing in the header should be a <meta> tag that specifies the character encoding of the page. Usually, the character encoding is declared by the web server that sends the page to the browser, but many servers are not configured to send this information, and specifying it here ensures the document is displayed correctly even when it is loaded directly from disk, without consulting a server:
<meta http-equiv="content-type" content="text/html; charset=utf-8">
In an XHTML document, <meta> tags should end with a slash to indicate they are empty:
<meta http-equiv="content-type" content="text/html; charset=utf-8"/>
With the encoding established, we can safely write the first piece of actual content in the page—the page title:
<title>title</title>
If you want to link a CSS file to the page to control its appearance (which you usually will), a <link> tag at this point will do the trick:
<link rel="stylesheet" type="text/css" href="style.css">
Again, the XHTML version of this tag needs a trailing slash to indicate it is empty:
<link rel="stylesheet" type="text/css" href="style.css"/>
If you want to link a JavaScript script to the page, and the script is designed to be invoked from the header, insert a <script> tag at this point. Whether the document is HTML or XHTML, you should include a full </script> closing tag for backwards compatibility:
<script type="text/javascript" src="script.js"></script>
That just about does it. You can end the header, then start the body of the page with a <body> tag. The content of the page is up to you, but since we’re talking about a minimal document, there need not be any body content at all:
</head>
<body>
</body>
</html>
So, how does your most recent work hold up? Have you included all the elements discussed above? Common omissions like the lang attribute and the content-type <meta> tag may seem unnecessary, but they really put that final layer of polish that ensure your site holds up with the best on the Web.
I’d love to hear your thoughts on the basic HTML elements discussed above, or any other elements you absolutely always include in your pages. Leave a comment and let me know!
Related posts:
- How to Block the Advert Blockers It may not be possible to detect ad blockers, but...
- HTML 4 Considered Harmful In this post, a frustrated James once again rallies to...
- How To Develop a jQuery Plugin Creating a jQuery plugin is easier than you might think....
- Styling the html and body Elements One of the most common ways to begin a...
- Implementing Event Latency in JavaScript Craig provides some useful JavaScript code to slow down event...







The
Content-TypeHTTP equivalent in your XHTML example is wrong. It states that the content type istext/html, which doesn’t allow the document to be parsed as XML. And if it’s to be parsed as XML you dont need thelangattribute;xml:langwill suffice.Instead, for real XHTML you should have an XML declaration at the top of the document, specifying the XML version (1.0) and the character encoding.
Browsing through long-forgotten documents on an old laptop the other day, I found some of my old HTML stuff. They did contain a doctype declaration … for HTML 2.0! :)
September 23rd, 2008 at 3:52 pm
Thanks for the quick feedback, Tommy!
The intent was to build a document that was parsable as XML by a non-browser system expecting generic XML input, but which would be treated as HTML by current browsers. Such a scenario seems to be the most practical application of XHTML at the moment.
Thus the “text/html” meta tag (which has no special meaning to a generic XML parser), and the two
langattributes.As you know, such a declaration fouls up the box model in Internet Explorer 6. To an increasing number of sites this is no longer a concern, but for now the defaults of XML 1.0 encoded as UTF-8 work fine, so there is no pressing need to include the declaration.
I wish I still had documents that old. I’d love to do an article on my greatest past sins against web standards. :)
September 23rd, 2008 at 4:00 pm
No version of IE supports XHTML anyway, and the text says ‘If you want to be able to process your document as XML’.
I know all about the widespread propensity for using pretend-XHTML, but there’s no reason for not explaining how it should be done, right? :)
September 23rd, 2008 at 6:08 pm
Except that isn’t XHTML.
September 23rd, 2008 at 7:31 pm
AutisticCuckoo,
Right, and there’s nothing about the
text/html<meta>tag that prevents you from processing your document as XML. Browsers won’t process it as XML by default, but that isn’t a typical use case in practice. You could, however, make anXMLHttpRequestcall from a script elsewhere on your site to load the page, parse it as XML, and extract some information from the resulting XML DOM.I’ve never liked the term “pretend XHTML”, as it implies we are dressing up HTML as XHTML. Rather, what we have here is an XHTML document that advertises itself to browsers as HTML. As I see it, the term “pretend HTML” would be more accurate. Or maybe “XHTML in HTML’s clothing”. :)
As for explaining how it should be done, if my intent were to assist the reader in achieving the most standards compliant markup possible, I would certainly have made room to mention
application/xhtml+xml. My intent, however, was to make a considered, practical recommendation, which is only to parse XHTML as XML when making active use of its XML features.Ryan,
It is. Under Section 5.1 of the XHTML 1.0 Recommendation, a valid XHTML document may advertise itself as
text/htmlfor compatibility reasons. This will prevent all major XHTML-capable browsers from recognizing it as XHTML, but if you’re going to be a stickler for the spec, it is still an XHTML document, and it will validate as such.In summary, my minimal XHTML document…
XMLHttpRequest)September 23rd, 2008 at 11:07 pm
One small gripe. A minimal html document doesn’t have css and javascript in it. Those tags aren’t required.
It would more likely have an h1 and a p before that, but then it’s still not a totally minimal document anymore.
September 24th, 2008 at 3:38 am
It depends what you want to do with a minimal html document. If you use it as an include with php, you can leave out everything since all the required stuff is already in the page that calls the minimal page.
As such, a page called example.html can have content like this:
No service before 11 am.
Nothing else. Then you call it in your page that has all the required elements:
and it will insert the line No service before 11 am.
Another function for a minimal html document can be an index file in a folder of a cms where you do not want hackers to be able to read the folder content. In that situation, index.html doesn’t have to contain anything, it is just a blank page without any text or code.
I agree, this is not exactly within the scope of this article, but I just wanted to show that there are instances in which all the required code is not needed.
September 24th, 2008 at 4:09 am
Although it obviously isn’t needed for a minimal document and browsers default to CSS anyway, to dot the “i’s” and “t’s” I include
<meta http-equiv="Content-Style-Type" content="text/css">in my document template, (in the excellent Quanta Plus editor on Linux).September 24th, 2008 at 4:43 am
Would it be trollish to point out Anne’s Weblog? View source if you don’t know what I mean.
September 24th, 2008 at 5:00 am
@arts-multimedia
Neither of those are html documents.
One is an include file that ends up as part of your html document, and the other is an empty file that happens to have the extension .html
Boy do I feel pedantic today. Sorry guys.
September 24th, 2008 at 5:08 am
Anne’s code leaves out the
<html>,<head>, and<body>tags, which is perfectly legal to do in an HTML 4 document, much as it was for HTML 3.2 (as mentioned at the top of my article). This “ultra-minimalism” does sacrifice a couple of benefits that I would not be comfortable with on most sites, however.The most important element she has left out is the
langattribute on the<html>tag, which identifies the primary language in use in the page. Search engines and other spidering systems make good use of this information.The fact that she has no
content-type<meta>tag will cause the files that make up her site to be interpreted as Latin-1 encoded text, instead of UTF-8, when opened without a web server. But if that is not an issue for her, then she can do without it.September 24th, 2008 at 7:55 am
Hi Kevin, nice article.
I just had one concern. You specifically state in the section, The Breakdown:
Although it may be correct XHTML 1.0 to serve as
text/html, AFAIK, even browsers capable of serving pages using the XML parser will not do so unless the MIME type is set asapplication/xhtml+xml. If this is indeed the case, setting theDOCTYPEto use XHTML isn’t sufficient to make the browser use XML parsing.You might consider making a separate section for XHTML 1.1, which should always be served with
application/xhtml+xml, IIRC.September 24th, 2008 at 9:20 am
Kevin Yank:
If you had done a minimum of checking, you’d see that Anne’s site sends the content encoding in the http header (text/html;charset=utf-8).
PS – Anne is a ‘he’ not a ’she’.
September 24th, 2008 at 1:06 pm
Look at what Kevin wrote more closely:
The content will not be rendered as UTF-8 without a web server, unless the browser’s default is set as such. When a server isn’t available to send the headers, the browser can only check the
<meta>tag specifyingContent-Type.Curtis
September 24th, 2008 at 1:16 pm
That’s exactly what you’re doing if you serve it as
text/html. No matter how much you wish it to be otherwise, it is HTML if you serve it as such. Invalid, ugly HTML, but still HTML.If you want it to be XHTML and make use of anything XHTML offers beyond what HTML can do, then you must serve it as an application of XML.
September 24th, 2008 at 4:03 pm
we had some problems with
when using script from one of our clients.
it is safer to use a chartset over here as well
September 24th, 2008 at 5:27 pm
I’m sometimes worried about these purists who want everybody to do exactly what the dogma says or else…
In real life, we all use several methods to come to a good result. Being restricted to one method leads to anaemia!
No scripting or l. Yet, there are tons of wonderful sites out there and I don’t care if they are written in ugly this or that as long as screen readers can handle them, all is well.
September 24th, 2008 at 11:04 pm
I’m sometimes worried about these purists who want everybody to do exactly what the dogma says or else…
In real life, we all use several methods to come to a good result. Being restricted to one method only leads to anaemia.
September 24th, 2008 at 11:09 pm
Just one word (link): “The World’s Best HTML Template.”
September 24th, 2008 at 11:21 pm
Since you’re talking about best practice here, I’d rather see the js script tag at the bottom of the body. That way, if people are following unobtrusive js best practice they’ll get a better YSlow score.
-Patrick (patspam.com)
September 25th, 2008 at 1:33 pm