SitePoint Sponsor

User Tag List

Results 1 to 6 of 6
  1. #1
    SitePoint Member
    Join Date
    Jul 2005
    Posts
    3
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Character set problem with MS word HTML document

    Iím having a problem with the character encoding of HTML files produced using Microsoft Word. When I view the local files in a web browser they display fine. However, when I post the file and view it on the web, the both Firefox and Explorer show the wrong character encoding. The web page loads with the encoding Unicode UFT-8 that makes it display incorrectly. If you manually change the encoding to Western it displays fine.

    See, for example:
    http://www.cjc.ca/uroproject/guide/B...atement-en.htm

    What I canít understand is that the character set is clearly specified in the head of the
    HTML as follows:
    <meta http-equiv="Content-Type" content="text/html; charset=windows-1252">

    Why are the browsers ignoring this? Iíve tried changing this line to other encodings such as charset=iso-8859-1 but the browser still displays in Unicode when the page loads.

    Please help! I have no idea why this is happening or what can be done about it.

    Thanks,
    Avi

  2. #2
    Chessplayer kleineme's Avatar
    Join Date
    Apr 2004
    Location
    Germany
    Posts
    608
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi,

    the HTTP header of your page looks like this:

    Code:
    HTTP/1.0 200 OK
    Connection: close
    Content-Length: 28830
    Content-Type: text/html; charset=UTF-8
    Date: Thu, 28 Jul 2005 15:27:35 GMT
    Server: Apache/2.0.50 (Fedora)
    Last-Modified: Tue, 26 Jul 2005 14:34:32 GMT
    ETag: "154677-709e-6e2c6e00"
    Accept-Ranges: bytes
    Keep-Alive: timeout=15, max=100
    so most probably your Apache sends its own charset, so that the browser ignores the meta tag. If you can't change the server configuration, maybe the following W3C document tells you a way out of it: FAQ: Setting 'charset' information in .htaccess
    Last edited by kleineme; Jul 29, 2005 at 02:01.
    Never ascribe to malice,
    that which can be explained by incompetence.
    Your code should not look unmaintainable, just be that way.

  3. #3
    bronze trophy
    Join Date
    Dec 2004
    Location
    Sweden
    Posts
    2,670
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Another solution would be to just change the encoding of the document to UTF-8.
    Simon Pieters

  4. #4
    SitePoint Author silver trophybronze trophy

    Join Date
    Nov 2004
    Location
    Ankh-Morpork
    Posts
    12,158
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I don't know if you can make MS Word generate UTF-8. If not, you must make the web server send the character encoding that Word produces (Windows cp1252). You can either do this by editing the .htaccess or httpd.conf files (for Apache) or by using a server-side scripting language like PHP to send the header for you.
    Birnam wood is come to Dunsinane

  5. #5
    SitePoint Member
    Join Date
    Jul 2005
    Posts
    3
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Red face

    Thanks to kleineme for that info about the server. It clears up the mystery of why the page displays well locally but not online. Where does one view that info (the HTTP header that you pasted)? It doesn't show up when you just view source from a browser. I spoke to the server guy and he was surprised that the server character set would override the HTML document and he said there was nothing he could do about it (grrrr).

    Thanks to zcorpan for the idea of saving it in UTF-8 in Word. I tried that and it works for the document I linked to. (If you click the link above you'll see that it now displays properly). Unfortunately some of my documents have their formating screwed up when I save them as unicode in Word (alas nothing is ever simple).

    ~Avi
    Last edited by ndbt; Jul 28, 2005 at 13:09.

  6. #6
    Chessplayer kleineme's Avatar
    Join Date
    Apr 2004
    Location
    Germany
    Posts
    608
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi,

    oops, I forgot to add the URL to the above mentioned W3C document, sorry. I've added it now.

    I've retrieved the HTTP-header of your document with cURL via PHP, but there are other ways to get to it. Here's another link to the W3C, this time even including the URL

    FAQ: Checking HTTP Headers
    Never ascribe to malice,
    that which can be explained by incompetence.
    Your code should not look unmaintainable, just be that way.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •