PHP is arguably the simplest language for server-side programming (as compared with
asp.net, C++, Java, JSP, Python, and others). Yet, like many older software tools, it provides only token support for Unicode (it accepts any bytes in a string, it has conversion functions for utf-8 that work with one-byte (ISO-8859-1) characters, it offers a limited set of MB_ string functions, etc.).
But if the programmer wishes to support input or output in human language, and naturally wants to manipulate strings using the PHP string functions, the result is failure. One would think that since the de facto Web standard encoding is utf-8, that PHP would be extended to support utf-8 strings natively. I can’t think of a technical reason why such support cannot be added to Zend and PHP. And I include in that belief the fact that there is no upper bound on the length in bytes of a single Unicode character (which may be outside the BMP or may be a grapheme containing many glyphs).
My primary question is: why has PHP not been extended? Why is there not a PHP directive to switch all the string functions (and any directly supporting code and libraries) so they work with utf-8? And a followup question: is there any known workaround (such as an additional library that is easy to add to Apache+PHP)?
The days when programmers dealt only with text in their own native language are over. It’s time to have some substantial eggs with our morning toast.