Htmlscpecailchars() VS. htmlspecialentities()

Which do you use and which is better for preventing XSS attacks?

htmlspecialchars() is sufficient to avoid having your content confused with HTML tags when displaying it within HTML.

Neither has anything to do with preventing XSS attacks. - validating input works best to do that.

An absolutely great article was written about this a few weeks back.

There’s more to HTML escaping than &, <, >, and "

Completely wrong. Escaping output has EVERYTHING to do with preventing XSS attacks. You never, ever trust that data has been completely sanitized and is safe for raw output - you always escape it.

There is a good article here as well. Lots of good information on Chris’s site.

Thanks for these great links guys. I’ll throw another one into the fray that helped answer my initial question:

http://stackoverflow.com/questions/46483/htmlentities-vs-htmlspecialchars

Essentially it says if your document is already encoded for UTF-8 (which mine always are), it’s best to use htmlspecialchars().

Of course it’s important to filter input, but these functions provide an extra layer of security when the data is displayed on the page. And as one of the articles suggested, this is critical to preventing XSS attacks.

Escaping it for HTML before insertying it in a database converts it to garbage.

Escaping it for SQL before writing it to a web page also converts it to garbage.

You never escape data for security reasons - you do it because the data is ALLOWED to contain characters that need to be escaped for the particular media (eg quotes within database content or less than signs within data to be output in a web page). If those characters are not ALLOWED TO BE THERE then you should strip them out during the sanitize/validation steps.

Anyone who always escapes data for HTML/SQL/auntymay/cheesesandwiches etc is generating JUNK.

Escaping is an output function. Security is something you do within your input function.

If you handle security properly then you can trust that data that has been completely sanitized is safe for raw output unless it is allowed to contain characters that need to be escaped. If you sanitize properly and the field is not allowed to contain those characters then escaping does absolutely nothing whatsoever.

You should never rely on escaping in place of performing proper sanitisation/validation.

Where did the OP say that he wanted to use 1 of the 2 functions before inserting into a database? He didn’t. The functions mentioned are obviously for outputting data -why would you assume they were being used for other purposes??

You’ve told me all I ever need to know about your advice regarding security right there. There are enough people out there writing horribly insecure websites. Please do not help create more by offering bad “security” advice on a subject you are not well versed in.

Once again, you can NEVER assume that date is safe to OUPUT in it’s raw form. I don’t care what your input validation consists of - the second you make that assumption you are already lost.

A valid point, but no one was arguing that you should.