SitePoint Sponsor

User Tag List

Results 1 to 4 of 4
  1. #1
    SitePoint Enthusiast
    Join Date
    May 2006
    Posts
    48
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Correct Encoding for HTML Emails

    Hello,

    Sorry if I have posted this in the wrong forum but I could not find a forum for email design.

    I have been attempting to understand the difference between (decimal)
    Numeric Character Reference (NCR) and Entity Character Reference.

    I would like to send my subscribers an HTML email but I am confused as to which encoding to use...

    If we take the example of the " symbol.

    In NCR it is ::
    PHP Code:
    " 
    Interesting that I had a problem trying to display the NCR code without the forum system changing it to a quote! PHP code tag seems to let it through untouched

    In Entity Character Reference it is ::
    "
    What should I use for my HTML emails? I have been told to encode characters such as quotes and other symbols such as the copyright symbol.

    I have located this great web page that lists the combinations for most characters.

    http://www.i18nguy.com/markup/ncrs.html

  2. #2
    SitePoint Author silver trophybronze trophy

    Join Date
    Nov 2004
    Location
    Ankh-Morpork
    Posts
    12,159
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The only characters you need to escape in HTML is '<' and '&#38;'. Quotation marks need only be escaped inside attribute values enclosed in the same quotation marks.

    Depending on the character encoding you use in your HTML document or email, you may need to use NCRs or entity references for characters that do not exist in the repertoire/encoding.

    NCRs should be 'safer' than entity references, because not all user agents support all entity references. Whether the corresponding character can be displayed or not depends on whether there is a glyph for it in the font that is used (and that's something you can never have full control over).

    So, if you use an encoding like UTF-8, ISO 8859-1 or Windows-1252, you can enter the &#169; character as is. If you want to escape it, use an NCR: preferably decimal (&#38;#169;) rather than hexadecimal (&#38;#xa9;).

    The only advantage of using an entity reference (&#38;copy;) is that it is more readable for humans.
    Birnam wood is come to Dunsinane

  3. #3
    SitePoint Enthusiast
    Join Date
    May 2006
    Posts
    48
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thank you for replying and providing some great information!

    In my HTML I use the following

    <html><head>
    <meta content="text/html;charset=iso-8859-1" http-equiv="Content-Type">


    In the message header I can see the following

    Content-Type: text/html; charset = "iso-8859-1"
    Content-Transfer-Encoding: 7bit

    So the iso-8859-1 is the encoding "version" I am using I guess which means I would need to find out what that version supports in regards to characters.

    I like the idea of using NCRs since you mentioned they are more supported than Entity.

    I hope I understood your information correctly.

    Cheers

    marc

  4. #4
    SitePoint Author silver trophybronze trophy

    Join Date
    Nov 2004
    Location
    Ankh-Morpork
    Posts
    12,159
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    If you use ISO 8859-1, then you can use a literal &#169; character, since it can be encoded in that encoding. You cannot use the Euro character (€), though, nor can you use typographically correct quotation marks, dashes, ellipses, etc. For those you need to resort to NCRs.
    Birnam wood is come to Dunsinane


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •