Don't use htmlentities() -- use htmlspecialchars() instead - faster and UTF-8 compat

It seems that even in O’Reilly’s book of Learning PHP & MySQL, the use of

htmlentities($user_input) is everywhere in the book.

but it is only to protect against user’s data have malicious HTML or Javascript code, so I think using htmlspecialchars() is faster, because it deals with only 4 characters (5 if single quote is replaced as well).

Moreover, htmlspecialchars() works well with UTF-8 without specifying the 3rd argument of UTF-8. htmlentities() will mess up UTF-8 string unless the 3rd argument is specified to be UTF-8.

Agreed

no arguments here either although I would like to see some benchmarking of the speed difference if available

so… it deals with less characters but provides the same protection?

I think so. that’s how the Ruby on Rails framework does it too. It has an h() or html_escape() function that escape these 4 characters.

http://www.ruby-doc.org/core/
html_escape(s)
escape ’&’, ’"’, ’<’ and ’>’ for use in HTML.

htmlspecialchars() seems to be about 2.5 times faster:

$orig = '<div style="background:#ffc">Hello World</div>';

$converted_htmlspecialchars = htmlspecialchars($orig);
$converted_htmlentities = htmlentities($orig);

if ($converted_htmlspecialchars != $converted_htmlentities) echo "special and ent not equal\
";
else echo "They are equal!\
";

$iRepeatNTimes = 100000;

$startTime = microtime(true);
for($i = 0;$i < $iRepeatNTimes; $i++) {
    $s = htmlspecialchars($orig);
}
echo "It took " . (microtime(true) - $startTime) . " to finish\
";

$startTime = microtime(true);
for($i = 0;$i < $iRepeatNTimes; $i++) {
    $s = htmlentities($orig); 
}
echo "It took " . (microtime(true) - $startTime) . " to finish\
";

# and i repeat the 2 loops above just to see how they vary to the initial values

Result:

They are equal!
It took 0.18208599090576 to finish
It took 0.4557158946991 to finish
It took 0.16565799713135 to finish
It took 0.40935683250427 to finish

also, then what is htmlentities() good for? merely to make sure the entities characters are displayed correctly when there is no encoding provided by the HTTP header or the http-equiv in <head> </head> and then displaying non-ASCII content?.. UTF-8 and ISO-8859-1 mismatch, and so forth? When we actually output the correct header and content in the corresponding encoding, there is really no use to use htmlentities()?

As said in another thread, if you are coding purely for efficiency, you are doing it wrong.

Use whatever tool is most appropriate for the job. If you need htmlentities, use it, and specify the UTF-8 argument if you need to. If you only need htmlspecialchars, then use that.

Don’t use speed to decide between the 2.

Well, the most common and practical need in my day to day use of PHP is:

  1. To prevent malicious user data from doing Cross Site Scripting (XSS)
  2. To actually print out HTML code on a webpage like here: <div>Hello World</div>

And htmlspecialchars() fully performs that function already.

So if you are so confident to say people are wrong, Stormrider, how about you just merely state 1 case which is common and practical, that we actually need htmlentities(), when both the header already specified the correct encoding type and the content is in the correct corresponding encoding?

As I said, use the right tool for the job. I didn’t say anyone was wrong, and if that’s all you need from the function, then go ahead and use that one.

All I am saying is that ‘it is marginally faster’ isn’t a very good reason. ‘It does what I want’ is a good reason.

As I said, state one good reason to use htmlentities(). htmlentities() is the right tool for what job, when both HTTP header gives the right encoding type such as “utf-8” for the content in that encoding.