SitePoint Sponsor

User Tag List

Page 1 of 2 12 LastLast
Results 1 to 25 of 28
  1. #1
    SitePoint Addict
    Join Date
    Sep 2008
    Location
    Rudgwick, UK
    Posts
    378
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    UTF8, but when set it doesn't work??

    Hi,

    Got a bit of a weird one, which I can't quite pin down.

    Forum IvoireLink.net: Main Index

    Its currently set as this, which kinda works:

    Code:
    <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1" />
    ..but when I change it to UTF-8 (as its a french site), it buggers up ALL the other foreign charachters, and fixes the "Voici la catégorie de sports" part, so it shows up correctly as "Voici la catégorie de sports"

    Code:
    <meta http-equiv="content-type" content="text/html; charset=utf-8" />
    Can anyone spot my bodge up? I'm guessing its something stupid - but I've never ever seen it when some charachters work, but not all of them - and vice versa when changing it to UTF 8 :/

    TIA!

    Andy

  2. #2
    Non-Member bronze trophy
    Join Date
    Nov 2009
    Location
    Keene, NH
    Posts
    3,760
    Mentioned
    23 Post(s)
    Tagged
    0 Thread(s)
    Are you... saving the file as UTF-8, or are you saving your ISO-8859-1 file with just the character meta changed?

    Does your UTF-8 in the character meta match the mime-type being served?

    They all need to match. Just changing the META to read UTF-8 doesn't mean the file is saved encoded AS UTF-8.

    So, you need these three to match:
    File Format/Encoding
    Meta saying what encoding is used
    Mime-type on the server.

    Sounds like you've got one, maybe two of those and not all three.

    Of course as a forums you also have how the posts were char-accepted meaning the data stored in the SQL databases may not be encoded to UTF-8, completely boning any chance of your existing posts ever being served as UTF-8 properly without adding more php to translate the old ones on the fly; this is why changing character encodings on an existing website is most always a disaster. Is that forum script set up to send utf-8?

  3. #3
    SitePoint Addict
    Join Date
    Apr 2011
    Posts
    265
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Hy
    If the page of your site are dinamically generated by a php script, you should have in your php file this code:
    PHP Code:
    header('Content-type: text/html; charset=utf-8'); 
    Also, you should save the file Encoded in UTF-8 , with the editor you use.
    Free: Web Programming Courses HTML, CSS, Flash
    Web Programming: AJAX Course and PHP-MySQL Course video Lessons
    Good JavaScript and jQuery course for beginners

  4. #4
    Resident curmudgeon bronze trophy gary.turner's Avatar
    Join Date
    Jan 2009
    Location
    Dallas
    Posts
    990
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Viewing your source, I find this:
    Code:
    <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1" />
    Since your server response header does not specify character encoding, the meta statement rules. Forcing the browser to utf-8 shows you are using utf-8 encoding, so changing the meta statement should fix things up.

    cheers,

    gary
    Anyone can build a usable website. It takes a graphic
    designer to make it slow, confusing, and painful to use.

    Simple minded html & css demos and tutorials

  5. #5
    Non-Member bronze trophy
    Join Date
    Nov 2009
    Location
    Keene, NH
    Posts
    3,760
    Mentioned
    23 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by gary.turner View Post
    so changing the meta statement should fix things up.
    Notice ultranerds said changing that meta was what's messing it up -- hence it being back at the unbroken version -- and hence his problem NOT lying with the meta, but lying in changing the meta and nothing else.

  6. #6
    Resident curmudgeon bronze trophy gary.turner's Avatar
    Join Date
    Jan 2009
    Location
    Dallas
    Posts
    990
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Jason, When I looked at the linked page, the meta statement set character encoding to iso-8859-1, and character rendering indicated that multi-byte character were rendered as multiple single byte characters. Forcing FF to use utf-8 as the encoding made the character renderings correct. From that I deduce the actual encoding is, indeed, utf-8 and, since the server does not set encoding, that leaves the meta element to do so.

    Did you test as I did, or take the OP's word for what was done?

    with
    Code:
    <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1" />
    Sports - Voici la catégorie de sports
    Forcing utf-8 decoding:
    Sports - Voici la catégorie de sports
    Server response header:
    Code:
    Date: Mon, 25 Jul 2011 00:28:09 GMT
    Server: Apache/2.2.19 (Unix) mod_ssl/2.2.19 OpenSSL/0.9.8e-fips-rhel5 mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635
    Expires: Tue, 25 Jan 2000 12:00:00 GMT
    Cache-Control: no-store
    Pragma: no-cache
    Last-Modified: Mon, 25 Jul 2011 00:28:10 GMT
    Keep-Alive: timeout=5, max=100
    Connection: Keep-Alive
    Transfer-Encoding: chunked
    Content-Type: text/html
    
    200 OK
    cheers,

    gary
    Last edited by gary.turner; Jul 24, 2011 at 16:49. Reason: added source, rendering and response header
    Anyone can build a usable website. It takes a graphic
    designer to make it slow, confusing, and painful to use.

    Simple minded html & css demos and tutorials

  7. #7
    Non-Member bronze trophy
    Join Date
    Nov 2009
    Location
    Keene, NH
    Posts
    3,760
    Mentioned
    23 Post(s)
    Tagged
    0 Thread(s)
    Funny, it was the other way around when I tested... we're probably looking at shifting code as he tries to figure it out.

    Making that change using Opera's editor just made the page worse -- Opera still reporting ISO-8859-1 even with the META -- but that's consistent with the behavior of just trying to use the meta to change that in the first place.

    Though looking deeper it has all sorts of code errors that could be putting the rendering all over the place across browsers. (originally I just looked at it in Opera). LINK inside BODY, MULTIPLE HEAD and BODY elements...

    AHA, that's why Opera's ignoring it.... all content after the second HEAD goes back to the default; ISO-8859-1... you say HEAD twice and BODY twice, don't expect things to be applied properly.

  8. #8
    Resident curmudgeon bronze trophy gary.turner's Avatar
    Join Date
    Jan 2009
    Location
    Dallas
    Posts
    990
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Yeah, I saw the syntax errors, but figured 1) fix the first things first, and 2) you'd be refactoring the code anyway.

    cheers,

    gary
    Anyone can build a usable website. It takes a graphic
    designer to make it slow, confusing, and painful to use.

    Simple minded html & css demos and tutorials

  9. #9
    SitePoint Addict
    Join Date
    Sep 2008
    Location
    Rudgwick, UK
    Posts
    378
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by deathshadow60 View Post
    Funny, it was the other way around when I tested... we're probably looking at shifting code as he tries to figure it out.

    Making that change using Opera's editor just made the page worse -- Opera still reporting ISO-8859-1 even with the META -- but that's consistent with the behavior of just trying to use the meta to change that in the first place.

    Though looking deeper it has all sorts of code errors that could be putting the rendering all over the place across browsers. (originally I just looked at it in Opera). LINK inside BODY, MULTIPLE HEAD and BODY elements...

    AHA, that's why Opera's ignoring it.... all content after the second HEAD goes back to the default; ISO-8859-1... you say HEAD twice and BODY twice, don't expect things to be applied properly.
    Hi,

    Thanks for the replies everyone I see what you mean, there are 2 BODY and 2 HEAD tags. Lemme try and fix those up, and see if that helps (I expect it will, as you said - its resetting the encoding for the page once it reaches the 2nd head)

    Will keep you posted

    Cheers

    Andy

  10. #10
    SitePoint Addict
    Join Date
    Sep 2008
    Location
    Rudgwick, UK
    Posts
    378
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by deathshadow60 View Post
    AHA, that's why Opera's ignoring it.... all content after the second HEAD goes back to the default; ISO-8859-1... you say HEAD twice and BODY twice, don't expect things to be applied properly.
    I've fixed up that part of it, but still no joy (I've removed the 2nd instances of <head> and <body>), yet not change. I also tried removing the extra stuff (scripts, link, etc) after the closing </head> tag, but that didn't help either

    I've also change the meta-type now to utf-8, so you can see the issue I'm having

    Code:
    <meta http-equiv="content-type" content="text/html; charset=utf-8" />
    Any more suggestions?

    TIA!

    Andy

  11. #11
    SitePoint Wizard Stomme poes's Avatar
    Join Date
    Aug 2007
    Location
    Netherlands
    Posts
    10,283
    Mentioned
    51 Post(s)
    Tagged
    2 Thread(s)
    Right now I'm seeing
    Actualit�s

    Server says UFT-8, browser is set to unicode (utf-8) setting in Firefox. Since I'm seeing the ? I'd either say the document wasn't originally saved as UTF-8 (though Gary says he sees otherwise) or that somewhere the document gets converted to latin-1 and then back.

    I usually got this type of error (?'s instead of é stuff) when I wrote and saved UTF-8 documents and someone decided to host them on a Latin-1 server :S my only solution to that kind of lack of control was to take everyone outside US-ASCII and manually writing out character entities. :( Since you have control of the server you shouldn't have to do that.

  12. #12
    Non-Member bronze trophy
    Join Date
    Nov 2009
    Location
    Keene, NH
    Posts
    3,760
    Mentioned
    23 Post(s)
    Tagged
    0 Thread(s)
    Cute -- your forums part is working properly, everything outside the forums is not. It was the other way around with the meta the other way...

    This is most likely because the stuff you have around the forums isn't saved as UTF-8 encoding. Load the files in, and make sure they're being saved with the right encoding.

  13. #13
    SitePoint Addict
    Join Date
    Sep 2008
    Location
    Rudgwick, UK
    Posts
    378
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi,

    Thanks for the reply. Man this is doing my nut in

    The pages are served up using a "template" system, and in that template I see the part:

    Code:
                  <select id="slct" name="what">
                    <option value="news" <%if what eq "news"%>selected="yes"<%endif%>>Actualité</option>
                    <option value="events" <%if what eq "events"%>selected="yes"<%endif%>>Evénéments</option>
                    <option value="yellow" <%if what eq "yellow"%>selected="yes"<%endif%>>Pages Jaunes</option>
                    <option value="classifieds" <%if what eq "classifieds"%>selected="yes"<%endif%>>Pétites Annonces</option>
                  </select>
    (so the encoding is fine there)

    That template is set as ANSII. Same goes for the forum homepage templates.

    Quote Originally Posted by deathshadow60
    Cute -- your forums part is working properly, everything outside the forums is not. It was the other way around with the meta the other way...

    This is most likely because the stuff you have around the forums isn't saved as UTF-8 encoding. Load the files in, and make sure they're being saved with the right encoding.
    Yeah, thats cos the META encoding was changed to UTF-8 (which is why I was confused to hell as to why when I changed from "normal" encoding into UTF8, it then reversed the encoding

    ARGH!!!! Thanks anyway guys, I'll keep digging and see if I can come up with anything

    Cheers

  14. #14
    Resident curmudgeon bronze trophy gary.turner's Avatar
    Join Date
    Jan 2009
    Location
    Dallas
    Posts
    990
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Right now I'm seeing
    Actualit�s
    The OP needs to open his html and template files files and "save as" in utf-8.

    cheers,

    gary
    Anyone can build a usable website. It takes a graphic
    designer to make it slow, confusing, and painful to use.

    Simple minded html & css demos and tutorials

  15. #15
    SitePoint Addict
    Join Date
    Sep 2008
    Location
    Rudgwick, UK
    Posts
    378
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by gary.turner View Post
    The OP needs to open his html and template files files and "save as" in utf-8.

    cheers,

    gary
    OMG, just tried that and it seems to have worked (gotta go through several hundred templates though to change them into UTF8 format, so may take a while - unless there is a SSH command I can run to do this quicker? )

    Thanks!

    Andy

  16. #16
    Resident curmudgeon bronze trophy gary.turner's Avatar
    Join Date
    Jan 2009
    Location
    Dallas
    Posts
    990
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The double header/body issue implies to me you are including those forum files. The forum templates are already in utf-8, are they not? If so, you needn't bother with them.

    cheers,

    gary
    Anyone can build a usable website. It takes a graphic
    designer to make it slow, confusing, and painful to use.

    Simple minded html & css demos and tutorials

  17. #17
    SitePoint Addict
    Join Date
    Sep 2008
    Location
    Rudgwick, UK
    Posts
    378
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by gary.turner View Post
    The double header/body issue implies to me you are including those forum files. The forum templates are already in utf-8, are they not? If so, you needn't bother with them.

    cheers,

    gary
    Hi,

    Nope, all of them were in ANSI, not UTF8 format. I've done this now with them all, and it works like a charm - thanks

    Andy

  18. #18
    SitePoint Wizard Stomme poes's Avatar
    Join Date
    Aug 2007
    Location
    Netherlands
    Posts
    10,283
    Mentioned
    51 Post(s)
    Tagged
    2 Thread(s)
    More than once I've seen people on these forums with documents which were saved as "ANSI" (I'm not even sure what that is, I thought it was a standards body)... it seems copies of Notepad and other text editors (esp outside the US?) are defaulted to that.

    Wonder if it would be a good idea to have a charset/MIME type sticky thread somewhere in the forums we could point people too? (with a link to that W3C page that explains the BOM pretty well)

  19. #19
    Robert Wellock silver trophybronze trophy xhtmlcoder's Avatar
    Join Date
    Apr 2002
    Location
    A Maze of Twisty Little Passages
    Posts
    6,316
    Mentioned
    60 Post(s)
    Tagged
    0 Thread(s)
    ANSI is the organisation which set the ASCII standard, i.e. 256 different symbols that a computer can use, etc. Hence why Unicode was needed. ANSI can also mean Windows-1252 a superset of ISO 8859-1 in fact there can be disambiguation.

  20. #20
    Mouse catcher silver trophy Stevie D's Avatar
    Join Date
    Mar 2006
    Location
    Yorkshire, UK
    Posts
    5,892
    Mentioned
    123 Post(s)
    Tagged
    1 Thread(s)
    This is why I always use named entities for anything outside ASCII ... I have no idea what encoding my text editor uses, so marking characters up as, eg, &eacute; solves the problem of what encoding to set. As a bonus, named entities are often easier to remember than the Alt-#### codes needed to produce them.

  21. #21
    SitePoint Wizard Stomme poes's Avatar
    Join Date
    Aug 2007
    Location
    Netherlands
    Posts
    10,283
    Mentioned
    51 Post(s)
    Tagged
    2 Thread(s)
    I use numerical (not named) entities... so even if my text somehow ends up in someone's XHTML, I'm still cool : )

  22. #22
    Mouse catcher silver trophy Stevie D's Avatar
    Join Date
    Mar 2006
    Location
    Yorkshire, UK
    Posts
    5,892
    Mentioned
    123 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by Stomme poes View Post
    I use numerical (not named) entities... so even if my text somehow ends up in someone's XHTML, I'm still cool : )
    I wouldn't, because (i) I can't remember them, (b) if it stuffs up someone else's XHTML because they've nicked my content and are too dumb to reprocess it, it serves them right!

  23. #23
    Non-Member bronze trophy
    Join Date
    Nov 2009
    Location
    Keene, NH
    Posts
    3,760
    Mentioned
    23 Post(s)
    Tagged
    0 Thread(s)
    Much less numbered entities are reliant on your character encoding, named entities are not.

    That's the real POINT of the named ones, so for english language use the character encoding doesn't matter. NOT that for english you NEED any of that extra garbage.

    Maybe I've been working on computers for too long, but I really don't understand the need for anything more than ascii7 on english language sites. Shtup styled quotes right up their serifs so far as I'm concerned.

  24. #24
    SitePoint Wizard Stomme poes's Avatar
    Join Date
    Aug 2007
    Location
    Netherlands
    Posts
    10,283
    Mentioned
    51 Post(s)
    Tagged
    2 Thread(s)
    but I really don't understand the need for anything more than ascii7 on english language sites.
    Of course, most of my sites aren't English...
    we have lots of é and ë etc...
    but even if I didn't, I have — which I cannot type out, nor can I type “curlies” nor even the € sign (even though I have a key for it, it's never done anything).

    I wrote those from memory, but once in a while I have to look one or two up (no big deal and doesn't slow me down any). They'll work in any of the charsets people tend to throw my stuff on, and they'll work if someone ever turns them into real XHTML or XML (the named ones are not legal there, excepting about... 4).

    Sometimes I need to use these characters in Javascript or CSS (content). There I have to use \u#### anyway.

  25. #25
    Non-Member bronze trophy
    Join Date
    Nov 2009
    Location
    Keene, NH
    Posts
    3,760
    Mentioned
    23 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Stomme poes View Post
    Of course, most of my sites aren't English...
    Which is why you're the target audience for UTF-8.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •