Htmlentities / htmlspecialchars deleting some inputs entirely

Greetings,

I am having some strange issues with the PHP functions htmlentities and htmlspecialchars. When my input has some weird characters in it, these functions will return a completely empty string. The odd part is it works fine if I hard code the inserted string, but it won’t work if I insert the string from the MySQL database. Here is an example:

This one works:

$title = "Test-1234P(¿µ).  Test #64777";
echo "String: ".htmlentities(trim($title), ENT_QUOTES)."<br />";
echo "String: ".htmlspecialchars(trim($title), ENT_QUOTES);

This one does not work (title is the same as before, but it’s now being called from the database):

$sql = "SELECT title FROM items WHERE id = '1' LIMIT 1";
$result = mysql_query($sql) or die(mysql_error());
$row = mysql_fetch_array($result);
$title = $row['title'];

echo "String: ".htmlentities(trim($title), ENT_QUOTES)."<br />";
echo "String: ".htmlspecialchars(trim($title), ENT_QUOTES);

Please let me know why this is happening and what I can do to get it to work from now on when calling the database.

Thanks
Kind regards

The third argument to htmlspecialchars/htmlentities is the character encoding. Depending on your version of PHP, this might be defaulting to ISO-8859-1. You probably want UTF-8.

htmlspecialchars(trim($title), ENT_QUOTES, ‘UTF-8’)

Thanks for the response. Unfortunately, I tried this today but it still does not work and an empty string is still returned.

What value is actually stored in the database?

If the same value were being retrieved from the database as in the first example then the two would be exactly equivalent and the second version would also work. Therefore either the value isn’t being stored in the database properly or the character encoding within the database is messing things up.

I’m with @felgall. @peppy It might be useful to download yourself a hex editor and have a look at the actual binary data coming out of the database both from the PHP functions mysql_*() and also the MySQL shell commands too perhaps.

Thanks for the responses. The value stored in the database is the same string as the first example:

$title = "Test-1234P(¿µ).  Test #64777"

I am allowing members to bulk import items. These bulk imports (CSV/TXT) get stored into a validated “final draft” TXT file, which then get inserted into the database. It appears that something in this method is messing it up since I can add an item one-at-a-time using that same title and it appears fine.

So all that’s left to explain the problem is the database encoding. Only if that is different from that of the web page would the web page treat the text any differently when reading it from the database than if it were hard coded in the page.

It looks like the MySQL table itself has a Collation of utf8_general_ci , and the title field also has a Collation of utf8_general_ci

On the top of all pages on my website, I have header tags:

header("Content-Type: text/html; charset=utf-8");
header("Vary: Accept-Encoding");

There appears to be no difference in the loading if I remove these tags. I’m not sure what to do here.

What do both your test cases output if you just echo the $title field instead of calling the escape functions? If they then both do the same thing then you know that the functions are treating it differently.