SitePoint Sponsor

User Tag List

Results 1 to 5 of 5
  1. #1
    SitePoint Enthusiast
    Join Date
    Apr 2009
    Posts
    46
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Long Hyphens and Quotes

    1) I'd like to have some long hyphens on my website. The same type (or similar) to what you get in Microsoft Word when you type two hypens together between two words, and then hit enter or space. The single long hyphen that is formed is easily cut and paste to notepad, but is does not show correctly on my website for when UTF-8 encoding is selected.

    It does show for when I select Western (ISO) or Western (Windows) encoding. btw, I have
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
    at the top of my page in every case.

    Should I just try writing charset=Wester (ISO) or something like that. I'd prefer to have utf-8, I've heard that it's better if people from non-English speaking countries go on my website.

    2) Same thing happens with quotes (e.g. ") in my HTML. They also appear as boxes when in UTF-8. They appear correctly in Western (ISO) or Western (Windows). What should I do?

  2. #2
    Follow Me On Twitter: @djg gold trophysilver trophybronze trophy Dan Grossman's Avatar
    Join Date
    Aug 2000
    Location
    Philadephia, PA
    Posts
    20,580
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Use the HTML entities &mdash; and &quot; instead

  3. #3
    SitePoint Addict bamaboy's Avatar
    Join Date
    Apr 2009
    Location
    Internet
    Posts
    224
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    As dan said, you must use entities instead of directly typing hyphens in the code.

    good luck
    Hymoo.com - Create Revolution - Help Us Spread The Word!
    Latest Wallpapers

  4. #4
    . shoooo... silver trophy logic_earth's Avatar
    Join Date
    Oct 2005
    Location
    CA
    Posts
    9,013
    Mentioned
    8 Post(s)
    Tagged
    0 Thread(s)
    The problem is you are copying the em dash from an encoding that is not UTF-8. the em dash from Word has a different code point.

    For reference the UTF-8 em dash: —
    http://en.wikipedia.org/wiki/Dash

    * When using UTF-8 and the actual UFT-8 characters there is no need to use entities.
    Last edited by logic_earth; Apr 26, 2009 at 05:03.
    Logic without the fatal effects.
    All code snippets are licensed under WTFPL.


  5. #5
    SitePoint Wizard Stomme poes's Avatar
    Join Date
    Aug 2007
    Location
    Netherlands
    Posts
    10,283
    Mentioned
    51 Post(s)
    Tagged
    2 Thread(s)
    The problem is you are copying the em dash from an encoding that is not UTF-8.
    Indeed. Were you typing into a text editor and saving the raw file in UTF-8 the quotes would just be quotes. " " ". Unless I have nested quotes, I do not use the character entities for them— even though my colleague is charset-lazy and takes my UTF-8 pages and stuffs them into Latin-1 pages and database. Lots of characters get lost in there if I don't manually write them out as entities (and if they will ever become converted to XML don't use the named entities because there are only 4 that also work in XML from HTML, use the numeric entities), but " isn't one of them. They are the same character in utf-8 and latin-1. Windows 1252 is just weird though.

    So instead of crawling through all your text to find all the goofy characters, for new documents, write them in something other than Word if possible, like a plain text editor with the properly set charset (though I wonder if you can change it in Word in the first place?).


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •