Writing in foreign!
Since this threat is about content writing I was going to place it in content writing section, but I thought it would be more appropriate here. I have a problem. Recently I have been developing many sites with more than one language. I noticed when I was ammending a Russian site all the russian characters were written in some special coding.
Anyway, I personally use a standard text editor to develop my sites, so I can see the code. Is there a way to write these special characters easily, and is there are proper name for them (e.g. unicode).
Many people ask me for help on Russian, Greek and Chinese language websites.
Any comments would be most appreciated.
There are three ways to represent characters in an HTML document:
- literal characters
- character entity references
- numeric character references (NCRs)
Literal characters is what this post is written with, as is most content on the web. You can use it for Russian text, too, like спасибо, provided you're using an appropriate character encoding.
Character entity references are available for certain accented Latin characters (e.g., ä for 'ä'), special symbols (e.g., … for '…') and Greek characters (e.g., Σ for 'Σ'), but not for Cyrillic letters.
NCRs are numeric references to the code positions in the ISO/IEC 10646 character repertoire (more or less the same as Unicode). These are normally written with decimal notation (e.g., &#176; for '°'), but can also be written in hexadecimal (&#xb0;)
You can use NCRs to write Cyrillic characters on page encoded with ISO 8859-1, or if you don't have a Russian keyboard. The word да is then written as &#1076;&#1072;.
I don't know about Chinese or Japanese, but all European language characters are covered by utf-8 encoding.
UTF-8 can represent the entire ISO/IEC 10646 repertoire, including Latin, Cyrillic, Hebrew, Arabic, Chinese, Japanese, Korean, Thai, Indian scripts, and many more.
The only problem is that not all text editors and WYSIWYG editors support UTF-8, and that even if they do it's not alway obvious how to actually type characters that aren't on one's keyboard.
Regarding problems of typing in foreign language - there is whole subset of tools mostly unknown to English speakers/writers :D
transliteration tool for Russian
transliteration tool for Latvian
and many more :)
P.S. I have always had to develop everything so that it supports both Latvian and Russian (and occasionally German), so my advice definitely is to use utf-8. It is not at all difficult and nowadays pretty much everywhere there is support for this encoding. Just make sure that:
1) Your http server does not change encoding.
2) Your DB uses this encoding
3) Your text editor does not save UTF-8 with BOM (because, for instance PHP does not like it and you get "warning headers already sent" message) or at least be aware of this.
Wow... I got a great response. Thanks.
AutisticCuckoo, thanks for your super answer. I will most certainly look more into it.
chinese sites always use GBK or UTF-8 ,another one is BIG-5?(not sure)
For simplified Chinese, they usually use GB2312, and for traditional Chinese, BIG-5.
UTF-8 will work for all languages.