The character encoding that you specify for a web page must match the encoding you used when saving your file. If you save your file as ISO-8859-1 and declare the encoding as UTF-8 (or vice versa) there'll be problems if you use characters outside the ASCII range.
ISO-8859-1 is both a character repertoire ('character set') and an encoding. It's a straight single byte one-to-one encoding, which means it contains 256 positions (0x00-0xFF). Quite a few of those are reserved for control characters (C0 in 0x00-0x1F and C1 in 0x80-0x9F). That leaves 192 printable characters, which is enough for simple texts in most Western European languages. Unfortunately, ISO-8859-1 doesn't include some very useful and common characters, like proper quotation marks and dashes. It also doesn't contain the Euro currency character (€). (ISO-8859-15 is meant to replace ISO-8859-1, and contains the Euro sign.)
UTF-8 is an encoding for the Unicode character repertoire. It uses between one and six bytes to encode each character and can thus represent any Unicode character. The first 128 characters (0x00-0x7F) are encoded identically to ISO-8859-1.
The character repertoire used in HTML is ISO-10646, which is virtually the same as Unicode. Both UTF-8 and ISO-8859-1 (and many others) can be used as the encoding, but ISO-8859-1 is much more limited since it can only represent the first 256 characters (of which only 192 are printable).
If you want to include a character that cannot be represented in your chosen encoding, you can use character entities (e.g., £) or numeric character references (£ or £).
This character is encoded differently in ISO-8859-1 and UTF-8. If you include a literal £ sign, and your declared encoding doesn't match the encoding in which you saved your file, the pound sign will not display correctly.
If you want to use UTF-8, you must save your file with an encoding of UTF-8 and declare the encoding to be UTF-8. The encoding can be declared using the charset attribute in the Content-Type HTTP header, e.g.:
Code:
Content-Type: text/html; charset=utf-8
If the encoding is not specified in the HTTP header, you can specify it using a META element:
HTML Code:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
Such a META element will be ignored if the information is sent in the real HTTP headers, though, but it can be useful for when the document is saved to disk and viewed locally.
For (real) XHTML, the encoding should be specified in the XML declaration (and omitted from the HTTP header):
Code:
<?xml version="1.0" encoding="utf-8"?>
This will only be applied if the document is served with an XML MIME type (preferably application/xhtml+xml). In that case, any META equivalent will be ignored.
XML parsers are only required to support UTF-8 and UTF-16. XML parsers used in web browsers are likely to support the same range of encodings as the accompanying HTML parsers, but if you want to be on the safe side you should only use UTF-8 or UTF-16 for XML (including XHTML).
Bookmarks