Are you… saving the file as UTF-8, or are you saving your ISO-8859-1 file with just the character meta changed?
Does your UTF-8 in the character meta match the mime-type being served?
They all need to match. Just changing the META to read UTF-8 doesn’t mean the file is saved encoded AS UTF-8.
So, you need these three to match:
Meta saying what encoding is used
Mime-type on the server.
Sounds like you’ve got one, maybe two of those and not all three.
Of course as a forums you also have how the posts were char-accepted meaning the data stored in the SQL databases may not be encoded to UTF-8, completely boning any chance of your existing posts ever being served as UTF-8 properly without adding more php to translate the old ones on the fly; this is why changing character encodings on an existing website is most always a disaster. Is that forum script set up to send utf-8?
Since your server response header does not specify character encoding, the meta statement rules. Forcing the browser to utf-8 shows you are using utf-8 encoding, so changing the meta statement should fix things up.
Notice ultranerds said changing that meta was what’s messing it up – hence it being back at the unbroken version – and hence his problem NOT lying with the meta, but lying in changing the meta and nothing else.
Jason, When I looked at the linked page, the meta statement set character encoding to iso-8859-1, and character rendering indicated that multi-byte character were rendered as multiple single byte characters. Forcing FF to use utf-8 as the encoding made the character renderings correct. From that I deduce the actual encoding is, indeed, utf-8 and, since the server does not set encoding, that leaves the meta element to do so.
Did you test as I did, or take the OP’s word for what was done?
Funny, it was the other way around when I tested… we’re probably looking at shifting code as he tries to figure it out.
Making that change using Opera’s editor just made the page worse – Opera still reporting ISO-8859-1 even with the META – but that’s consistent with the behavior of just trying to use the meta to change that in the first place.
Though looking deeper it has all sorts of code errors that could be putting the rendering all over the place across browsers. (originally I just looked at it in Opera). LINK inside BODY, MULTIPLE HEAD and BODY elements…
AHA, that’s why Opera’s ignoring it… all content after the second HEAD goes back to the default; ISO-8859-1… you say HEAD twice and BODY twice, don’t expect things to be applied properly.
Thanks for the replies everyone I see what you mean, there are 2 BODY and 2 HEAD tags. Lemme try and fix those up, and see if that helps (I expect it will, as you said - its resetting the encoding for the page once it reaches the 2nd head)
I’ve fixed up that part of it, but still no joy (I’ve removed the 2nd instances of <head> and <body>), yet not change. I also tried removing the extra stuff (scripts, link, etc) after the closing </head> tag, but that didn’t help either
I’ve also change the meta-type now to utf-8, so you can see the issue I’m having
Server says UFT-8, browser is set to unicode (utf-8) setting in Firefox. Since I’m seeing the ? I’d either say the document wasn’t originally saved as UTF-8 (though Gary says he sees otherwise) or that somewhere the document gets converted to latin-1 and then back.
OMG, just tried that and it seems to have worked (gotta go through several hundred templates though to change them into UTF8 format, so may take a while - unless there is a SSH command I can run to do this quicker? ;))
More than once I’ve seen people on these forums with documents which were saved as “ANSI” (I’m not even sure what that is, I thought it was a standards body)… it seems copies of Notepad and other text editors (esp outside the US?) are defaulted to that.
Wonder if it would be a good idea to have a charset/MIME type sticky thread somewhere in the forums we could point people too? (with a link to that W3C page that explains the BOM pretty well)
ANSI is the organisation which set the ASCII standard, i.e. 256 different symbols that a computer can use, etc. Hence why Unicode was needed. ANSI can also mean Windows-1252 a superset of ISO 8859-1 in fact there can be disambiguation.
This is why I always use named entities for anything outside ASCII … I have no idea what encoding my text editor uses, so marking characters up as, eg, é solves the problem of what encoding to set. As a bonus, named entities are often easier to remember than the Alt-#### codes needed to produce them.