One of the tougher issues on the modern web is that of internationalization, often shortened to i18n (‘i’, then 18 more letters, then ‘n’). The world-wide web truly lives up to its name, and even if your site has a local audience you still may find yourself dealing with foreign letters, be they names with German punctuation or quotations in a foreign language.

The ultimate solution to the foreign character problem is Unicode, a truly enormous standard which attempts to document and provide encoding for virtually every character in every language known to man, with space left over for future language developments. A great starting point for understanding Unicode is Tim Bray’s essay, On Unicode. If that leaves you thirsty for more, Joel Spolsky’s The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) makes for great reading as well.

So now you understand the basis of the i18n problem, how do you go about solving it in your own work with the web? Thankfully, the W3C have two useful documents on the subject as part of their Authoring Techniques for XHTML and HTML Internationalization series: Characters and Encodings 1.0 and Specifying the language of content. Don’t be put off by the long winded titles or the extensive pre-ambles; the meat of these documents is a set of very easy to follow guidelines. Jukka Korpela’s tutorial on character code issues is another excellent resource on the subject.

As a final note, if you’re looking to use Unicode with PHP you may find Keith Devens’ notes on the subject extremely useful. And if you want to test your web applications for character encoding awareness try pasting in the test from Sam Ruby’s Survival guide to i18n.

Free Guide:

How to Choose the Right Charting Library for Your Application

How do you make sure that the charting library you choose has everything you need? Sign up to receive this detailed guide from FusionCharts, which explores all the factors you need to consider before making the decision.

  • Sathyaish Chakravarthy

    To this list of useful links on the Unicode character set, I’d like to add a chapter from Steven Roman’s book Win32 API Programming with Visual Basic. It is the best ever resource I have discovered on the subject. The chapter is titled “Strings” and sits in the MSDN April 2001 library as well.

  • Octal

    As a web developer in South Wales I am often tasked with producing bi-lingual websites. Internationalisation is extremely important, thanks Simon for highlighting it.

  • bwarrene

    This is a terribly important topic for global developers – and one not covered enough! Thanks a bunch for the great links and bringing this to our attention.

  • avine

    A couple more sites that are useful for Unicode and i18n:

    The Unicode Consortium

    Sun Globalization Resources

    I18n Guy’s I18n and L10n portal

  • Mattias

    Great that this topic is covered. One thing that I find missing is reference data, ie files with complete alphabets and so on. In Unicode, of course.
    Unicode string generating software is not abundant either, commercial or not.
    However, this might just be me being blind…

  • pfitz

    Thanks for the useful resources. I have to create an english/chinese site shortly and they will come in handy. At this stage I am still trying to work out which chinese set to use :S And a unicode generator of some sort for chinese symbols would be nice.. I’ll keep looking.

Ending Soon
Free SitePoint Premium

Get one free year of unlimited book and course downloads on SitePoint Premium!