SitePoint Sponsor

User Tag List

Results 1 to 8 of 8
  1. #1
    SitePoint Mentor silver trophybronze trophy

    Join Date
    Feb 2008
    Location
    Preston, Lancashire
    Posts
    1,376
    Mentioned
    71 Post(s)
    Tagged
    1 Thread(s)

    Writing in foreign!

    Dear all,

    Since this threat is about content writing I was going to place it in content writing section, but I thought it would be more appropriate here. I have a problem. Recently I have been developing many sites with more than one language. I noticed when I was ammending a Russian site all the russian characters were written in some special coding.

    Anyway, I personally use a standard text editor to develop my sites, so I can see the code. Is there a way to write these special characters easily, and is there are proper name for them (e.g. unicode).

    Many people ask me for help on Russian, Greek and Chinese language websites.

    Any comments would be most appreciated.


    Kind Regards,
    Sega

  2. #2
    SitePoint Author silver trophybronze trophy

    Join Date
    Nov 2004
    Location
    Ankh-Morpork
    Posts
    12,159
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    There are three ways to represent characters in an HTML document:
    1. literal characters
    2. character entity references
    3. numeric character references (NCRs)


    Literal characters is what this post is written with, as is most content on the web. You can use it for Russian text, too, like спасибо, provided you're using an appropriate character encoding.

    Character entity references are available for certain accented Latin characters (e.g., ä for 'ä'), special symbols (e.g., … for '…') and Greek characters (e.g., Σ for 'Σ'), but not for Cyrillic letters.

    NCRs are numeric references to the code positions in the ISO/IEC 10646 character repertoire (more or less the same as Unicode). These are normally written with decimal notation (e.g., ° for '°'), but can also be written in hexadecimal (°)

    You can use NCRs to write Cyrillic characters on page encoded with ISO 8859-1, or if you don't have a Russian keyboard. The word да is then written as да.
    Birnam wood is come to Dunsinane

  3. #3
    secure webapps for all Aleksejs's Avatar
    Join Date
    Apr 2008
    Location
    Riga, Latvia
    Posts
    755
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I don't know about Chinese or Japanese, but all European language characters are covered by utf-8 encoding.

  4. #4
    SitePoint Author silver trophybronze trophy

    Join Date
    Nov 2004
    Location
    Ankh-Morpork
    Posts
    12,159
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    UTF-8 can represent the entire ISO/IEC 10646 repertoire, including Latin, Cyrillic, Hebrew, Arabic, Chinese, Japanese, Korean, Thai, Indian scripts, and many more.

    The only problem is that not all text editors and WYSIWYG editors support UTF-8, and that even if they do it's not alway obvious how to actually type characters that aren't on one's keyboard.
    Birnam wood is come to Dunsinane

  5. #5
    secure webapps for all Aleksejs's Avatar
    Join Date
    Apr 2008
    Location
    Riga, Latvia
    Posts
    755
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Regarding problems of typing in foreign language - there is whole subset of tools mostly unknown to English speakers/writers
    transliteration tool for Russian
    transliteration tool for Latvian
    and many more

    P.S. I have always had to develop everything so that it supports both Latvian and Russian (and occasionally German), so my advice definitely is to use utf-8. It is not at all difficult and nowadays pretty much everywhere there is support for this encoding. Just make sure that:
    1) Your http server does not change encoding.
    2) Your DB uses this encoding
    3) Your text editor does not save UTF-8 with BOM (because, for instance PHP does not like it and you get "warning headers already sent" message) or at least be aware of this.

  6. #6
    SitePoint Mentor silver trophybronze trophy

    Join Date
    Feb 2008
    Location
    Preston, Lancashire
    Posts
    1,376
    Mentioned
    71 Post(s)
    Tagged
    1 Thread(s)
    Wow... I got a great response. Thanks.

    AutisticCuckoo, thanks for your super answer. I will most certainly look more into it.

  7. #7
    SitePoint Addict
    Join Date
    May 2008
    Posts
    228
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    chinese sites always use GBK or UTF-8 ,another one is BIG-5?(not sure)

  8. #8
    SitePoint Member cnxtrans's Avatar
    Join Date
    Apr 2008
    Location
    Hong Kong
    Posts
    5
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    For simplified Chinese, they usually use GB2312, and for traditional Chinese, BIG-5.

    UTF-8 will work for all languages.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •