Why does PHP still not handle Unicode?

#1

PHP is arguably the simplest language for server-side programming (as compared with asp.net, C++, Java, JSP, Python, and others). Yet, like many older software tools, it provides only token support for Unicode (it accepts any bytes in a string, it has conversion functions for utf-8 that work with one-byte (ISO-8859-1) characters, it offers a limited set of MB_ string functions, etc.).

But if the programmer wishes to support input or output in human language, and naturally wants to manipulate strings using the PHP string functions, the result is failure. One would think that since the de facto Web standard encoding is utf-8, that PHP would be extended to support utf-8 strings natively. I can’t think of a technical reason why such support cannot be added to Zend and PHP. And I include in that belief the fact that there is no upper bound on the length in bytes of a single Unicode character (which may be outside the BMP or may be a grapheme containing many glyphs).

My primary question is: why has PHP not been extended? Why is there not a PHP directive to switch all the string functions (and any directly supporting code and libraries) so they work with utf-8? And a followup question: is there any known workaround (such as an additional library that is easy to add to Apache+PHP)?

The days when programmers dealt only with text in their own native language are over. It’s time to have some substantial eggs with our morning toast.

#2

https://www.php.net/manual/en/book.mbstring.php

#3

Thank you. I should have tried out the MB_ functions before assuming they wouldn’t work with utf-8 internal strings. Adding and removing the BOM is certainly not much of a problem.

#4

Ever wonder what happened to php 6? The powers that be spent years trying to basically redesign php from the ground up in order for it handle unicode cleanly. They finally gave up and scraped the entire version.

I might add that I find it strange that you think C++ somehow provides built in support for utf strings.

#5

I never said that. I said that C++ was a server-side programming language. I’m curious if you disagree?

#6

I wrote a small PHP program to read in my favorite UTF-8 test file (no BOM) and to do simple string manipulation using the MB_ string functions. To my pleasant surprise, it worked perfectly. I will try to delete this posting in 24 hours, since I’m obviously wrong!

#7

Wrong? No. Misinformed? Yes. People who often come from another language automatically think PHP is like how majority of the chatter is. It’s actually not. Majority of the chatter are just exaggerated or misinformed comments. I don’t think you should delete your comments or thread. It serves a purpose for others who may have the same concern or opinion about PHP.