Frequently Asked Questions About XHTML vs HTML
Table of contents
[list=1][]What is XHTML?
[]How is XHTML different from HTML
[]How is XHTML 1.1 different from XHTML 1.0?
[]What about XHTML 2.0?
[]Should I use XHTML or HTML?
[]Should I use XHTML 1.0 or XHTML 1.1?
[]Why do so many books and sites recommend XHTML over HTML?
[]Is XHTML supported by all browsers?
[]Is XHTML more strict than HTML?
[]Is XHTML more semantic than HTML?
[]Can you use CSS with both XHTML and HTML?
[]What does the XML declaration do? Should I use it?
[]How is the DOCTYPE declaration used?
[]Do I need the xmlns
attribute in my <html>
tag?
[]How does this MIME type thingy work?
[]Can I set the MIME type with a <meta/>
tag?
[]How do I serve XHTML with the proper MIME type?
[]Is serving XHTML as text/html
harmful?
[]What’s the benefit of serving XHTML as application/xhtml+xml?
[]Should I use content negotiation when serving XHTML?
[]Which character encoding should I use for XHTML?
[]Can I use XHTML files on my hard drive?
[]Can I use XHTML with Internet Explorer?
[]Will IE7 or IE8 support XHTML?[/list]
What is XHTML?
XHTML 1.0 is a ‘reformulation of HTML 4 as an XML 1.0 application’, according to the XHTML 1.0 specification.
In other words, it is an XML-based markup language that has the same set of element types and attributes as HTML 4.
How is XHTML different from HTML?
XHTML is fundamentally different from HTML, despite looking very similar.
- XHTML is XML, which means that the syntax rules are slightly different.
- There are things you can do in XHTML which you cannot do in HTML.
- There are things you can do in HTML which you cannot do in XHTML.
- There are differences concerning CSS.
- There are differences concerning client-side scripting (e.g., JavaScript).
Differences in Syntax Rules
- XHTML is case-sensitive, HTML is not. All tags and attributes must be lowercase in XHTML.
- XHTML, being XML, must be well-formed. Every element must have an end tag, or use the self-closing tag syntax. HTML allows some end tags and even some start tags to be omitted.
- If an XML parser encounters a well-formedness error, it must abort. An SGML or HTML parser is expected to try to salvage what it can and keep going.
- All attributes must have a value in XHTML. HTML allows some attributes (e.g.,
selected
) to be minimised. - All attribute values must be surrounded by double or single quotes. HTML allows quotes to be omitted if the value contains only alphanumeric characters (and some others).
- The comment syntax is more limited in XHTML, but that’s rarely an issue for most designers/developers.
Things You Can Do in XHTML But Not In HTML
- Use CDATA sections (
<![CDATA[ … ]]>
). That’s useful if you have content with lots of literal characters that otherwise need to be escaped. - Use PIs (processing instructions), e.g., to link to a style sheet:
<?xml-stylesheet type="text/css" href="style.css" media="screen"?>
- Include elements from other XML namespaces (see below).
- Use the
'
character entity.
Things You Can Do in HTML But Cannot Do in XHTML
- ‘Hide’ the contents of
style
orscript
elements with SGML comments (<!--…-->
). - Create parts of the page dynamically with JavaScript while the document is still loading (e.g., using
document.write()
). - Use named character entities (e.g.,
) other than the four predefined ones:<
,>
,&
and"
. - Use the
.innerHTML
property with JavaScript (technically this is non-standard even in HTML).
Differences Concerning CSS
- Element type selectors in CSS are case sensitive for XHTML, but not for HTML.
- In HTML, the properties
background-color
,background-image
andoverflow
on theBODY
element will be applied to the root element (HTML
) unless specified for that element also. That is not the case for XHTML.
In HTML some start tags are optional, but the element node exists in the document object model even if the tags don’t occur in the markup. If we want to style header cells in the table body, we might use a CSS rule like this one:
tbody th {text-align:left}
In HTML, this will work even if we omit the <tbody>
and </tbody>
tags in our markup, because the TBODY
element will be created anyway. That will not work in XHTML; unless we have explicit <tbody>
and </tbody>
tags, the selector will not match.
Differences Concerning JavaScript
document.write()
cannot be used with XHTML (see Why document.write() doesn’t work in XML)- DOM methods like
createElement()
must be replaced by their namespace-aware counterparts (createElementNS()
etc.). - The non-standard
.innerHTML
property should not be used for XHTML documents. - The same issues with implicit elements that occur for CSS also apply for JavaScript.
How is XHTML 1.1 different from XHTML 1.0?
XHTML 1.1 is a reformulation of XHTML 1.0 Strict using Modularization of XHTML, which simply means that the definitions of the various element types have been separated into a number of modules.
XHTML 1.1 deprecates the lang
attribute (in favour of xml:lang
) and also the name
attribute for <a>
and <map>
tags. It also adds a number of elements for Ruby annotations.
What about XHTML 2.0?
What about it? It shows no signs of becoming even a candidate recommendation any time soon. We don’t know what it will contain, but it seems as if it is not going to be backwards compatible with XHTML 1.0.
Should I use XHTML or HTML?
That depends on who you ask. There are a number of technical issues with this question, which preclude a simple and short answer. In reality, the latest W3C recommendation with widespread support is HTML 4.01. Unless you actually need any of the features that XHTML offers over HTML, there is no technical reason to use XHTML.
In order to actually benefit from using XHTML, you really need to understand the fundamental differences between XHTML and HTML. Such a site will only be available to a small minority of the surfing population, however.
Some web designers and developers prefer XHTML’s syntax rules over HTML’s. By following certain guidelines, you can use this syntax without technically using XHTML at all (see below). There are a number of potential pitfalls with this approach, but it is a possible way forward for those who absolutely want to type <br*/>
instead of <br>
.
For ‘future-proofing’ your documents, using a Strict doctype is more important than whether you use XHTML or HTML.
Should I use XHTML 1.0 or XHTML 1.1?
Unless you need to use Ruby annotations, and your target audience can be expected to have the required plug-ins for that, you should not use XHTML 1.1.
In particular, if you are serving your XHTML markup as text/html
(see below), you must not use XHTML 1.1. Since it deprecates the lang
attribute, it is not backwards compatible with HTML and must not be served as such.
Why do so many books and sites recommend XHTML over HTML?
When the XHTML 1.0 specification was released, many designers and developers were quite excited about it. It was XML, which was all the rage back then, yet could be used as if it were HTML, and it ‘worked in all browsers’. People saw countless possibilities with the extensibility mechanism, and when W3C stated that there would be no more versions of HTML, XHTML was seen as the future-proof alternative.
Eventually some less palatable aspects of using XHTML were uncovered and the extensibility myth was debunked, but this didn’t receive quite the same amount of publicity. Many authors thus still advocate XHTML over HTML out of ignorance or because of personal preference.
Is XHTML supported by all browsers?
No. Only a few mainstream browsers support XHTML, like Opera, Firefox and Safari.
Most importantly, Internet Explorer does not support XHTML at all.
If you follow certain guidelines you can serve XHTML documents as text/html
(see below). That means the document will be seen as HTML, which all browsers can handle. Virtually all browsers have a parser bug that ignores the slash in self-closing tags.
Is XHTML more strict than HTML?
No. The syntax rules of XML (and thus XHTML) are simpler and more consistent, but both XHTML and HTML can be parsed unambiguously as long as the markup is valid.
Is XHTML more semantic than HTML?
No. XHTML 1.0 is just a reformulation of HTML 4.01. It contains the same elements and attributes and comes in the same three flavours (DTDs). There is no difference in semantics.
Can you use CSS with both XHTML and HTML?
Yes. You sometimes see preposterous claims that CSS can only be used with XHTML, but that is just disinformation. The first CSS specification came out in 1996, four years before XHTML.
What does the XML declaration do? Should I use it?
The XML declaration (sometimes incorrectly called the XML prologue) looks something like this:
<?xml version="1.0" encoding="utf-8"?>
It tells an XML parser that the document is an application of XML 1.0 and which character encoding it uses. If the encoding is anything other than UTF-8 or UTF-16 you must use the XML declaration, unless the web server sends encoding information in its HTTP headers. Even if it does, you should use the XML declaration, so that the right encoding is specified even if the document is saved to disk and opened locally.
This applies when XHTML documents are served as such. When served as text/html
, the XML declaration should be ignored, but some old HTML-only browsers can choke on it. In particular, Internet Explorer 6 will render the document in quirks mode if there is an XML declaration before the DOCTYPE declaration. In these cases, you should omit the XML declaration, since the document is not treated as XML anyway.
How is the DOCTYPE declaration used?
One may be led to believe that the DOCTYPE declaration at the top of the document is what tells the user agent that it is an XHTML document. However, this is not the case. The original purpose of the DOCTYPE declaration only had to do with markup validation. A validator needs to know against which document type definition (DTD) to check for compliance. Browsers don’t use validating parsers, because there is no need, so they used to ignore the DOCTYPE.
When IE5/Mac was launched, it had a novelty feature: doctype switching. Its support for web standards was a major improvement compared to older version, and compared to its contemporary cousin on the Windows platform. In order to provide good standards support and still avoid breaking the millions of web sites that were written to accommodate IE’s incorrect CSS rendering, the DOCTYPE declaration was used to make an educated guess as to whether the document was ‘modern’ or ‘old-school’. This feature was then included in IE6/Win, and can now be found in most modern browsers.
So the DOCTYPE declaration serves two purposes: it tells a validator aginst which DTD the document claims conformance, and it is used by browsers to determine the rendering mode to use. It has absolutely nothing to do with the XHTML vs HTML issue, however. Browsers that support XHTML use the ‘strict standards’ rendering mode for XHTML documents, provided that they are served as such.
Do I need the xmlns
attribute in my <html>
tag?
Yes. That is what tells user agents that the document is, in fact, XHTML, rather than any other application of XML. If the xmlns
attribute is missing, or doesn’t contain the right value, the markup will not be recognised as XHTML. The attribute is invalid in HTML, and will thus be ignored if the document is served as text/html
. The correct value to use is
xmlns="http://www.w3.org/1999/xhtml"
Namespaces in XML allow us to use the same element type name for different elements. For instance, a fictive WidgetML markup language can use an element type called label
. By declaring a separate namespace for WidgetML, we can use those label
elements in an XHTML page, even if that contains a form with label
elements, and the browser will have no problem keeping them apart.
An XML namespace is bound to a URI. The XHTML namespace mentioned above is one example. If we want to include WidgetML elements throughout our XHTML document, we can use a prefix and bind the WidgetML namespace to that:
<html xmlns="http://www.w3.org/1999/xhml"
xmlns:w="http://example.com/ns/widgetml"
xml:lang="en">
This binds the prefix ‘w
’ to the WidgetML namespace. To separate XHTML’s label
elements from WidgetML’s, we use the prefix in our tags: <w:label>Blue Widget</w:label>
.
How does this MIME type thingy work?
When a resource is requested via the HTTP protocol, the web server sends an HTTP response consisting of one or more headers, a blank line (CR+LF) and the document body. For a web page, the body is the HTML or XHTML document, i.e., the markup we write.
The HTTP headers provide meta-information about the document. One of the most important headers is Content-Type
, which informs the user agent what type of content the response body contains. It may also convey information about which character encoding it uses. For HTML, such a header can look like this:
Content-Type: text/html; charset=iso-8859-1
The text/html
part consists of a MIME media type name (text
) and subtype name (html
). The charset
part is an optional attribute.
According to RFC 2854 this MIME type identifies the content as HTML, which means that user agents must parse and interpret the contents as HTML. Even if it’s actually a Microsoft Word document, a GIF image … or an XHTML document. In other words, if the MIME type is text/html
, the document is HTML (not XHTML).
This means that no XML-only features can be used. It also means that HTML-only features can be used, but doing so defies the purpose of using XHTML markup in the first place.
There are three MIME types that we can use for XHTML documents, which will make compliant user agents recognise the document as XML:
application/xhtml+xml
(recommended)application/xml
text/xml
(not recommended)
The recommended MIME type for XHTML is application/xhtml+xml
, which is defined in RFC 3236. Note, however, that this is not supported by any version of Internet Explorer at this time (June 2006).
Although it is possible to use text/xml
(defined in RFC 3023), it is not recommended due to the odd way the default character encoding is specified. With this MIME type, the encoding must be sent in the HTTP header; it cannot be overridden by an XML declaration. It also defaults to the not-very-useful encoding US-ASCII.
Can I set the MIME type with a <meta/>
tag?
No. A user agent needs to know the content type before it starts parsing the response body. When it encounters an element like this, it’s already too late:
<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8"/>
The MIME type must be sent as a Content-Type
HTTP header. The character encoding should be specified in the XML declaration (see above).
How do I serve XHTML with the proper MIME type?
You need to instruct your web server to send the proper HTTP header. The exact technique depends on which HTTP server you use, but for Apache, you can use the AddType
directive:
AddType application/xhtml+xml .xhtml .xht
This makes Apache send an application/xhtml+xml
MIME type for files ending with .xhtml
or .xht
. The directive can be put in the global configuration file (/etc/httpd.conf
on most *nix systems) or in a local .htaccess
file in a directory.
Sometimes we don’t have access to the configuration files, but we need not lose hope quite yet. If we can use a server-side scripting language, we can send the HTTP header ourselves. For instance, using PHP:
header('Content-Type: application/xhtml+xml; charset=utf-8');
(Note that this header must be sent before a single byte of document content is written to the response stream.)
Is serving XHTML as text/html
harmful?
In 2002 Ian Hickson published an article labelled Sending XHTML as text/html Considered Harmful. It has been criticised by many XHTML proponents, but it should be required reading for anyone who is going to use XHTML markup.
Serving XHTML documents as text/html
is not necessarily harmful, if you know what you are doing and are aware of the fundamental differences between XHTML and HTML. Relying on HTML-only techniques, however, is ‘harmful’, because that means that a purported XHTML document will not work as XHTML.
Thus, if you are going to serve XHTML documents as text/html
, you must make sure that they also work as intended when served as application/xhtml+xml
.
You must also make sure to follow all the guidelines in Appendix C of the XHTML 1.0 specification. Although this appendix isn’t normative, it offers guidelines for maintaining the compatibility that is required for serving XHTML markup to user agents that only support HTML. Including a blank character before the />
in a self-closing tag, for instance, is necessary to avoid confusing some old browsers. For XHTML served as XML, no such space is necessary since all XML parsers understand self-closing tags.
What’s the benefit of serving XHTML as application/xhtml+xml?
That the document is recognised as XHTML by user agents.
Presumably, you are using XHTML for a reason. Unless the document is recognised as XHTML, you cannot use any of the features XHTML offers over HTML.
Should I use content negotiation when serving XHTML?
Content negotiation means examining the Accept
HTTP header sent by user agents and serving different content types to different user agents. For instance, Opera, Firefox and Safari state that they support application/xtml+xml
, so they would receive XHTML markup with that MIME type. Meanwhile, browsers like Internet Explorer, would receive HTML markup served as text/html
.
There is currently no point in doing this, other than to impress other computer geeks with your knowledge. If the content can be transformed into HTML it doesn’t require any XML features. You might as well use HTML 4.01 or serve XHTML as text/html
to all user agents.
It is especially useless to do content type negotiation, i.e., sending the same content to everyone, but sending different Content-Type
headers to different browsers.
Which character encoding should I use for XHTML?
XML parsers are only required to support UTF-8 and UTF-16. If you use anything other than that, there is no guarantee that the parser can interpret the document correctly. In reality, browsers generally seem to support the same range of encodings as for HTML, but if you want to be on the safe side, stick to UTF-8 or UTF-16.
Can I use XHTML files on my hard drive?
When a web page is opened from the local hard drive, there is no HTTP server involved to send the proper HTTP headers. The file extension is then often used to make an educated guess about the content type. Opera and Firefox will assume an XML content type for files ending with .xhtml
, .xht
or .xml
.
This will not work for Internet Explorer, of course, since it doesn’t support XHTML.
Can I use XHTML with Internet Explorer?
No. Not really.
IE does not support the application/xhtml+xml
MIME type, and will prompt the user to download the page if it’s served as such. You can make IE recognise this MIME type through a registry hack, but it will still treat it as HTML.
If you need the XML features of XHTML, you can serve the document as application/xml
. That is supported by IE, but XHTML’s namespace is not, which means IE will see the document as generic XML. There will be no default style sheet, so you have to specify explicit rules for every element type (including display:block
for all block-level elements).
You can, of course, serve XHTML markup as text/html
, but as has been mentioned above that means the document will be seen as HTML with syntax errors.