Don't use htmlentities() -- use htmlspecialchars() instead - faster and UTF-8 compat

winterheat · October 1, 2008, 4:32pm

It seems that even in O’Reilly’s book of Learning PHP & MySQL, the use of

htmlentities($user_input) is everywhere in the book.

but it is only to protect against user’s data have malicious HTML or Javascript code, so I think using htmlspecialchars() is faster, because it deals with only 4 characters (5 if single quote is replaced as well).

Moreover, htmlspecialchars() works well with UTF-8 without specifying the 3rd argument of UTF-8. htmlentities() will mess up UTF-8 string unless the 3rd argument is specified to be UTF-8.

system · October 1, 2008, 6:32pm

Agreed

SpikeZ · October 1, 2008, 6:59pm

no arguments here either although I would like to see some benchmarking of the speed difference if available

AfroNinja · October 1, 2008, 7:26pm

so… it deals with less characters but provides the same protection?

winterheat · October 1, 2008, 8:09pm

I think so. that’s how the Ruby on Rails framework does it too. It has an h() or html_escape() function that escape these 4 characters.

http://www.ruby-doc.org/core/
html_escape(s)
escape ’&’, ’"’, ’<’ and ’>’ for use in HTML.

winterheat · October 2, 2008, 7:18am

htmlspecialchars() seems to be about 2.5 times faster:

$orig = '<div style="background:#ffc">Hello World</div>';

$converted_htmlspecialchars = htmlspecialchars($orig);
$converted_htmlentities = htmlentities($orig);

if ($converted_htmlspecialchars != $converted_htmlentities) echo "special and ent not equal\
";
else echo "They are equal!\
";

$iRepeatNTimes = 100000;

$startTime = microtime(true);
for($i = 0;$i < $iRepeatNTimes; $i++) {
    $s = htmlspecialchars($orig);
}
echo "It took " . (microtime(true) - $startTime) . " to finish\
";

$startTime = microtime(true);
for($i = 0;$i < $iRepeatNTimes; $i++) {
    $s = htmlentities($orig); 
}
echo "It took " . (microtime(true) - $startTime) . " to finish\
";

# and i repeat the 2 loops above just to see how they vary to the initial values

Result:

They are equal!
It took 0.18208599090576 to finish
It took 0.4557158946991 to finish
It took 0.16565799713135 to finish
It took 0.40935683250427 to finish

winterheat · October 2, 2008, 7:28am

also, then what is htmlentities() good for? merely to make sure the entities characters are displayed correctly when there is no encoding provided by the HTTP header or the http-equiv in <head> </head> and then displaying non-ASCII content?.. UTF-8 and ISO-8859-1 mismatch, and so forth? When we actually output the correct header and content in the corresponding encoding, there is really no use to use htmlentities()?

Stormrider · October 2, 2008, 8:24am

As said in another thread, if you are coding purely for efficiency, you are doing it wrong.

Use whatever tool is most appropriate for the job. If you need htmlentities, use it, and specify the UTF-8 argument if you need to. If you only need htmlspecialchars, then use that.

Don’t use speed to decide between the 2.

winterheat · October 2, 2008, 5:44pm

Well, the most common and practical need in my day to day use of PHP is:

To prevent malicious user data from doing Cross Site Scripting (XSS)
To actually print out HTML code on a webpage like here: <div>Hello World</div>

And htmlspecialchars() fully performs that function already.

So if you are so confident to say people are wrong, Stormrider, how about you just merely state 1 case which is common and practical, that we actually need htmlentities(), when both the header already specified the correct encoding type and the content is in the correct corresponding encoding?

Stormrider · October 5, 2008, 9:46am

As I said, use the right tool for the job. I didn’t say anyone was wrong, and if that’s all you need from the function, then go ahead and use that one.

All I am saying is that ‘it is marginally faster’ isn’t a very good reason. ‘It does what I want’ is a good reason.

winterheat · October 5, 2008, 10:10am

As I said, state one good reason to use htmlentities(). htmlentities() is the right tool for what job, when both HTTP header gives the right encoding type such as “utf-8” for the content in that encoding.

Topic		Replies	Views
The only difference between htmlspecialchars() and htmlentities() in PHP PHP	4	12286	September 5, 2008
Htmlentities / htmlspecialchars deleting some inputs entirely PHP	8	2420	February 20, 2014
Htmlentites() vs urlencode() PHP	3	18815	October 9, 2007
Htmlscpecailchars() VS. htmlspecialentities() PHP	6	1230	May 12, 2011
The correct way of using htmlspecialchars() PHP	16	16001	January 10, 2014

Don't use htmlentities() -- use htmlspecialchars() instead - faster and UTF-8 compat

Related topics