I am having some strange issues with the PHP functions htmlentities and htmlspecialchars. When my input has some weird characters in it, these functions will return a completely empty string. The odd part is it works fine if I hard code the inserted string, but it won’t work if I insert the string from the MySQL database. Here is an example:
The third argument to htmlspecialchars/htmlentities is the character encoding. Depending on your version of PHP, this might be defaulting to ISO-8859-1. You probably want UTF-8.
If the same value were being retrieved from the database as in the first example then the two would be exactly equivalent and the second version would also work. Therefore either the value isn’t being stored in the database properly or the character encoding within the database is messing things up.
I’m with @felgall. @peppy It might be useful to download yourself a hex editor and have a look at the actual binary data coming out of the database both from the PHP functions mysql_*() and also the MySQL shell commands too perhaps.
Thanks for the responses. The value stored in the database is the same string as the first example:
$title = "Test-1234P(¿µ). Test #64777"
I am allowing members to bulk import items. These bulk imports (CSV/TXT) get stored into a validated “final draft” TXT file, which then get inserted into the database. It appears that something in this method is messing it up since I can add an item one-at-a-time using that same title and it appears fine.
So all that’s left to explain the problem is the database encoding. Only if that is different from that of the web page would the web page treat the text any differently when reading it from the database than if it were hard coded in the page.
What do both your test cases output if you just echo the $title field instead of calling the escape functions? If they then both do the same thing then you know that the functions are treating it differently.