SitePoint Sponsor

User Tag List

Results 1 to 8 of 8
  1. #1
    SitePoint Enthusiast
    Join Date
    Feb 2006
    Posts
    87
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    inverted question marks when pasting apostrophes

    when i'm copying and pasting from word into internet explorer all (') apostrophes get changed into inverted () question marks in the paragraph. what html tag should i put to prevent this from happening. i already put
    Code:
    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" >
    but nothing has changed. please what should i add to the head so that i get normal the normal character???
    please do note that the content is saved in the database as CLOB type.
    Last edited by dausboy; Oct 17, 2006 at 08:30.

  2. #2
    Caveat surfer Buddy Bradley's Avatar
    Join Date
    May 2003
    Location
    Cambridge, UK
    Posts
    2,366
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    It's because Word uses the correct apostrophe character, but HTML can only cope with the single or double quote. You could do a find-and-replace to fix the incorrect quote characters with either the basic form or the unicode character reference.

  3. #3
    SitePoint Enthusiast
    Join Date
    Feb 2006
    Posts
    87
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    but the thing is this is a web application how can i tell all my users what to do? what do u think is the best solution? u have to take in mind that users copy and paste from word inside the application, i'm sure they won't be using something else and its quite annoying doing DB changes everytime to find and replace the wrong characters.

  4. #4
    SitePoint Author silver trophybronze trophy

    Join Date
    Nov 2004
    Location
    Ankh-Morpork
    Posts
    12,158
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    It's an encoding problem. Word uses Windows-1252, where the typographically correct apostrophe is included. But you declare the encoding as ISO&#160;8859-1, and the code point used for the apostrophe lies in the range reserved for C1 control characters in the ISO encoding.

    See the HTML FAQ for more information, including possible workarounds.
    Birnam wood is come to Dunsinane

  5. #5
    SitePoint Enthusiast
    Join Date
    Feb 2006
    Posts
    87
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    so which encoding type should i use to resolve this problem???

  6. #6
    SitePoint Author silver trophybronze trophy

    Join Date
    Nov 2004
    Location
    Ankh-Morpork
    Posts
    12,158
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The important thing is that the encoding you (or, rather, your web server) declare in the Content-Type HTTP header is the same one as you used when saving your source file.

    It looks like you're saving your file as Windows-1252, which is not something I'd recommend on a public web site since it's Windows specific.

    Either save the file as ISO&#160;8859-1 or change the encoding declaration on the server to Windows-1252. Note, however, that this apostrophe is not available in the ISO encoding, so if you choose that way, you need to use an entity (&#38;rsquo;&#41; or a reference (&#38;#8217;&#41;.

    Probably the best solution would be to use UTF-8. That means saving the file as UTF-8 and making the server send UTF-8 as the encoding declaration. UTF-8 can represent any character in the ISO&#160;10646, which is the character repertoire used by HTML. (It's virtually the same thing as Unicode.)
    Birnam wood is come to Dunsinane

  7. #7
    SitePoint Enthusiast
    Join Date
    Feb 2006
    Posts
    87
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I've added <meta http-equiv="Content-Type" content="text/html; charset=utf-8" > but i'm still seeing the inverted question marks. My internet explorer uses Western European Encoding. If i'm copying from a normal word document, should i save that document as a utf-8 web page and then copy paste onto my application? what else is missing u think?

  8. #8
    SitePoint Author silver trophybronze trophy

    Join Date
    Nov 2004
    Location
    Ankh-Morpork
    Posts
    12,158
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The META element will be ignored if your web server is sending encoding information in the Content-Type header. You must make sure that your web server is sending the correct encoding, or that it doesn't send any encoding information at all (in which case your META element may be applied).

    Also, you cannot just change the encoding declaration without changing the actual encoding. The easiest way for you is probably to declare the encoding as Windows-1252, but as I said before, that encoding doesn't really belong on a public web site.

    If you copy from Word, you may run into problems depending on which editor you use for your HTML document. You may have to set it to Windows-1252 first, then copy from Word, then save as UTF-8. I don't know if you can make Word use UTF-8; I haven't looked, and I don't have Word available at the moment.
    Birnam wood is come to Dunsinane


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •