It seems that even in O’Reilly’s book of Learning PHP & MySQL, the use of
htmlentities($user_input) is everywhere in the book.
but it is only to protect against user’s data have malicious HTML or Javascript code, so I think using htmlspecialchars() is faster, because it deals with only 4 characters (5 if single quote is replaced as well).
Moreover, htmlspecialchars() works well with UTF-8 without specifying the 3rd argument of UTF-8. htmlentities() will mess up UTF-8 string unless the 3rd argument is specified to be UTF-8.
htmlspecialchars() seems to be about 2.5 times faster:
$orig = '<div style="background:#ffc">Hello World</div>';
$converted_htmlspecialchars = htmlspecialchars($orig);
$converted_htmlentities = htmlentities($orig);
if ($converted_htmlspecialchars != $converted_htmlentities) echo "special and ent not equal\
";
else echo "They are equal!\
";
$iRepeatNTimes = 100000;
$startTime = microtime(true);
for($i = 0;$i < $iRepeatNTimes; $i++) {
$s = htmlspecialchars($orig);
}
echo "It took " . (microtime(true) - $startTime) . " to finish\
";
$startTime = microtime(true);
for($i = 0;$i < $iRepeatNTimes; $i++) {
$s = htmlentities($orig);
}
echo "It took " . (microtime(true) - $startTime) . " to finish\
";
# and i repeat the 2 loops above just to see how they vary to the initial values
Result:
They are equal!
It took 0.18208599090576 to finish
It took 0.4557158946991 to finish
It took 0.16565799713135 to finish
It took 0.40935683250427 to finish
also, then what is htmlentities() good for? merely to make sure the entities characters are displayed correctly when there is no encoding provided by the HTTP header or the http-equiv in <head> </head> and then displaying non-ASCII content?.. UTF-8 and ISO-8859-1 mismatch, and so forth? When we actually output the correct header and content in the corresponding encoding, there is really no use to use htmlentities()?
As said in another thread, if you are coding purely for efficiency, you are doing it wrong.
Use whatever tool is most appropriate for the job. If you need htmlentities, use it, and specify the UTF-8 argument if you need to. If you only need htmlspecialchars, then use that.
Well, the most common and practical need in my day to day use of PHP is:
To prevent malicious user data from doing Cross Site Scripting (XSS)
To actually print out HTML code on a webpage like here: <div>Hello World</div>
And htmlspecialchars() fully performs that function already.
So if you are so confident to say people are wrong, Stormrider, how about you just merely state 1 case which is common and practical, that we actually need htmlentities(), when both the header already specified the correct encoding type and the content is in the correct corresponding encoding?
As I said, use the right tool for the job. I didn’t say anyone was wrong, and if that’s all you need from the function, then go ahead and use that one.
All I am saying is that ‘it is marginally faster’ isn’t a very good reason. ‘It does what I want’ is a good reason.
As I said, state one good reason to use htmlentities(). htmlentities() is the right tool for what job, when both HTTP header gives the right encoding type such as “utf-8” for the content in that encoding.