PHP i18n to improve?

By Harry Fuecks

i18n in PHP is largely a mess today, to the point where Joel Spolsky singled it out last year with;

When I discovered that the popular web development tool PHP has almost complete ignorance of character encoding issues, blithely using 8 bits for characters, making it darn near impossible to develop good international web applications, I thought, enough is enough.

Note this response from Scott Reynen, looking at solutions implemented in PHP.

Anyway, looks like there’s a chance things may be about to get better. Check out this interesting message on Zend…

It looks very much like the developer in question, who uses the handle L0t3k, (and seems to be having trouble finding a permanent home on the web) was serious – work seems to be in progress at If anyone can help him out with some web space seems like the place to send mail is cshmoove hotmail dot com.

Also interesting is the other project he mentioned;

Current PHP supports i18n feature via mbstring. However, ZendEngine and many PHP functions do not support i18n feature natively. This project aims at provide i18n feature natively. The outcomes are supposed to be merged to PHP project later.

More CVS to be browsed here.

Would be interesting to here what status the php-i18n project is at, if any of the developers happen to see this.

  • ryansking

    From the link:
    > Date: Sat, 22 Mar 2003 09:47:10 -0500

    not exactly breaking news, if you ask me.

  • Aska

    I’ve always wondered why i18n in PHP gets so little attention… This topic is of particular interest if PHP is to gain worldwide acceptance. Thank you Harry for mentioning efforts like the ones above.

    On a side note, to get around the lack of unicode support in PHP, I’ve been using a method similar to what Scott Renyen proposed albeit done in JavaScript (eeek!).

  • HarryF

    not exactly breaking news, if you ask me.

    True but this had seemed to have vanished since that post. Looking at the CVS though (and posts on the php.i18n mailing list) work is still in progress and we may see it in PECL soon.

    albeit done in JavaScript

    Sly move! Keeps the PHP fast I guess. Do you strip out anything that makes it past Javascript (e.g. browsers where it’s disabled)?

  • Aska

    Do you strip out anything that makes it past Javascript (e.g. browsers where it’s disabled)?

    Actually, no *sheepish smile*. Which is why the PHP solution is more elegant I suppose. Perhaps I was overly optimistic in assuming that all visitors to that particular site will have javascript turned on :)

  • Tim Strehle

    Adding Unicode support to your PHP application is just the first step to internationalization/localization, but a rather easy one.

    We’ve switched our (large) PHP application to UTF-8 completely, and are quite happy so far. The mb_* functions work fine.

    The only trouble we’ve had is with regular expressions – there’s multibyte ereg_* functions, but no multibyte preg_* functions. And (IIRC) case insensitivity only works for ASCII characters.

    A great localization resource is – want to know what Thai date formats? It’s all there! (

  • CT

    I wonder if this is not dealt with in PHP (as it is in other languages/systems) because it us usually pushed out to the template layer. Is this another case where PHP deals with a problem in a different way because it’s problem space is limited to web apps so it can.

    Selecting a different set of template files based on lanugage selection or settings is pretty trivial. Many PHP sites do this. If you have separated your presentation from your logic it’s really up to the browser to deal with the character encoding of the data the templates. As there are actually page design issues in displaying different languages, it may actually be better not to have the programmer deal with it.

  • l0t3k

    Wow Harry ! Thanks for the writeup.

    i’ve been trying since i started the project over 2 years ago to get some notice
    (and more importantly, more developer support). i’m working hard on
    getting the extension ready for PECL. i’ve already run the idea by Wez, so its just a matter of
    setting up CVS.

    for those not following the project, support is also provided for the following in addition to
    unicode string manipulation :

    International calendar support(Gregorian, Buddhist, Japanese, Hebrew, Islamic). To see sample code for a month/year
    calendar see .
    Note that only 1 method (GregorianCalendar::getMonthGrid) is needed to get all the necessary data. all other lines
    are for getting parameters or generating output.

    Locale Support (over 200 included for standard parsing/formatting wherever the extension compiles)

    Locale Sensitive Decimal/Date/Time/Currency formatting and parsing

    TimeZone support (current and historical) based on the Olson database (over 400 supported)

    ResourceBundle support. This promises a more flexible alternative to gettext.

    Unicode character database queries (essentially duplicating )

    i’m currently writing/porting test scripts to shake out remaining bugs and ensure API coverage.
    one big issue is that my linux box is currently borked (physically), so i havent been enable that
    the extension builds on *nix.

    as always, i’d appreciate any help i can get.


  • Amit

    I have been playing with PHP and Unicode (in the UTF-8 encoding) strings for past year and half. For normal basic stuff like output, database query, searches etc, seems to work fine without any problem. Even with the plain ol’ string functions of the PHP. So in that respect, I guess PHP is quite adequate for now (atleast for my needs).

    However for future, I do hope the i18n support is added soon.

  • shadowcaster

    Im an advanced php programmer and I have NO-IDEA what i18n is.
    Why am I not surprised? :(

  • HLindset

    shadowcaster: i18n means internationalization
    nternationalizatio is 18 characters, thus i18n :) just the common way to make that word easier/faster to write

  • Anonymous

    shadowcaster: i18n means internationalization nternationalizatio is 18 characters, thus i18n :) just the common way to make that word easier/faster to write

  • CT

    Actually the 18 in I18N is for the 18 letters between the I and the N (there are 20 letter is internationalization). Thus localization is shortened to L10N. The latest buzzcronym is G11N for globalization which combines I18N and L10N.

    Maybe JK28N2 would be clearer than G11N though.

  • wei

    There is a not native attempt at providing some I18N to PHP. The data used is from ICU as well. It provideds number, currency, date, and message formating tools. The classes are independent of the PRADO project, but some components are provided to make it easy to use within PRADO.

    See example of usage

  • jplush76

    I believe the preg functions already support multibyte unicode by appending the -u modifier to the end.



Learn Coding Online
Learn Web Development

Start learning web development and design for free with SitePoint Premium!

Get the latest in Front-end, once a week, for free.