SitePoint Sponsor

User Tag List

Results 1 to 5 of 5
  1. #1
    SitePoint Zealot stuffedbuggy's Avatar
    Join Date
    Sep 2008
    Posts
    187
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    What is the best char-set to use ?

    I was wondering what's the most popular char-set to use in a web page?

  2. #2
    SitePoint Guru
    Join Date
    Feb 2008
    Posts
    655
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    It is probably utf-8, perhaps followed by iso-8859-1.

  3. #3
    SitePoint Author silver trophybronze trophy

    Join Date
    Nov 2004
    Location
    Ankh-Morpork
    Posts
    12,158
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    That would depend entirely on which language the page is written in. For Western pages the most common is probably ISO 8859-1 or Windows-1252.

    I always recommend UTF-8 whenever possible, since it can encode any character in the ISO/IEC 10646 repertoire used by HTML. That way you don't have to use entity references or numeric character references at all.

    That requires an editor that lets you input all character literally, though. It also requires that all components in your publishing chain supports UTF-8.
    Birnam wood is come to Dunsinane

  4. #4
    SitePoint Wizard bronze trophy Black Max's Avatar
    Join Date
    Apr 2007
    Posts
    4,029
    Mentioned
    12 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by AutisticCuckoo View Post
    It also requires that all components in your publishing chain supports UTF-8.
    Tommy, this one I didn't know. What happens if one element in the chain doesn't support UTF-8? I assume the site's textual content will display at least somewhat...?

  5. #5
    SitePoint Author silver trophybronze trophy

    Join Date
    Nov 2004
    Location
    Ankh-Morpork
    Posts
    12,158
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    If you're using the built-in string functions in PHP, such as strlen(), you'll get the wrong result for strings that contain non-ASCII characters. These problems are minor and can be circumvented if you're aware of them. See Troels' recent article Character Encoding: Issues with Cultural Integration, for instance.

    Java/J2EE/JSP applications may cause more problems, but they can usually be solved. Tomcat, for instance, needs some hacking before it properly supports UTF-8 parameters in HTTP GET requests, and you'll need to tell it to use UTF-8 for output as well.
    Birnam wood is come to Dunsinane


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •