Content-type: iso-8859-1 or utf-8?

Hi,

I’ve always used:

<meta http-equiv=“Content-Type” content=“text/html; charset=iso-8859-1” />

on my web pages. I have done so for years and never questioned it. However, now the text editor I use has updated itself it complains about the encoding if I use the above meta tag. I have noticed now a lot of people use utf-8 instead and if I use this my text editor no longer generates a warning.

Is there any one I should be using? I’m guessing UTF 8 is a much more up-to-date character set. Should I be using that instead? Are there any advantages/disadvantages to using one or the other?

Thanks.

You might want to read the SitePoint article The Definitive Guide to Web Character Encoding to get some basic information about character encodings.

The important thing to understand is that you can’t just set whatever you want in the <meta> tag; you must declare the same encoding as you used when saving your source files!

If your new editor saves as UTF-8, then you should declare UTF-8, but only then.

Also, remember that any Content-Type heading sent by your web server will take precedence over a <meta> element, but the two should match.

UTF-8 has many advantages over ISO 8859-1, especially that it can natively represent every Unicode character. You don’t have to mess around with entity references like &ndash; any more, and you can easily mix the Latin alphabet with texts in Hebrew, Arabic, Devanāgarī, Japanese, Korean, Chinese, etc.

Yes, there are advantages and disadvantages depending upon whether you use server-side technologies, etc. Though usually UTF-8 with no BOM is the preferred encoding for most English language websites.

That’s great, thanks. It seems my text editor was forcing me to match the meta tag and the file so that’s good. Thanks for the link, that was a good read. It’s prompted three questions though:

  1. I’m confused though it says with UTF-8 you don’t need to bother with entities but from my experience can get away with that a lot of the time with ISO 8859-1 too. Or is that just my modern browser compensating for it?

  2. If you use UTF-8 what do you need entities for? Just " and chevrons?

  3. The articles says, “The next problem is something called a byte order mark, or BOM. This is a sequence of two (UTF-16) or three (UTF-8) octets that tells a computer whether the most or least significant octet comes first. Some browsers don’t understand the BOM, and will output it as text. Other editors won’t allow us to omit the BOM.” Which browser don’t understand BOM?

Thanks again.

ISO 8859-1 doesn’t contain dashes (em dash, en dash), typographically correct apostrophes and quotation marks, horizontal ellipses, etc. If you want to use those, you have to use an entity reference or an NCR. UTF-8 can encode these characters, and hundreds of thousands more, natively.

You only need entities when you have to escape certain characters that have special meaning in HTML, such as the ‘<’ and ‘&’ characters (and, sometimes, quotation marks inside attribute values).

Even if I use UTF-8, I’d still use entity references or NCRs for non-break spaces and soft hyphens, though. The former is otherwise indistinguishable from regular spaces and the latter would be invisible. :slight_smile:

None worth worrying about. There might be some old dinosaurs, that’s all. However, the BOM is completely unnecessary for UTF-8, and it’s best practice to omit it.

Well thanks a million for that. I’m the wiser now!

I’m really anal with dashes and quotations so this is all good stuff.

Off Topic:

Tommy, good to see you. Mrs Max keeps asking how you’re doing.

Off Topic:

Tell her I’m fine. Bored, but fine. :slight_smile:

Off Topic:

I told her. She says come to NC for fine cookin’, fancy conversation, and kitchen whiskey, and you won’t be bored any longer. :smiley:

Off Topic:

If you can also offer a snow-free environment, I’ll book the tickets today! :smiley:

Off Topic:

Come down to Australia and sweat!

Off Topic:

You need to cool down? I’ve got a few million tons of snow that you can have at a bargain price …

Off Topic:

Pack your bags, Tommy. It’s 42 degrees (5.5 C) here, the sun is shining, and people are surfing in the ocean not 10 minutes from where I sit. None of those darn ice-locked fjords, just warm sun, cool water, and no snow.