Dilemma regarding htmlentities() - when it must be used and when it can be omitted

I understand what htmlentities() does and that it can increase security() but I don’t want to use it indiscriminately. My dilemma is as follows -

I have a project that accepts user input via a form, the form data is analysed and searched for the occurrence of particular strings. Now whilst the data displays ok in a browser the actual text is obviously interspersed with additional characters which makes analysing it difficult.

My questions are -

  1. If submitted text is not converted using htmlentities() is it only a threat if you click on it?
  2. Does htmlentities() have any benefit in preventing SQL injection? ie can I save it unconverted and just convert it before I display it?
  3. If input is not converted using htmlentities() is there any threat if it is never saved to a database and never displayed in a browser?

Sorry if this sounds a bit dumb, but I like to understand as fully as possible

Actually I have just realised I can use html_entity_decode() during analysis to convert back temporarily, but I’d still be interested to hearing comments / feedback on the 3 questions above.

I will try to answer…

En/decoding html entities is not used to prevent sql injections. For this you have prepared statements. So you can (and should) save the inputted text into the database without any changes.

You need to html encode your text before you output it to the browser to prevent XSS (cross-Site-scripting) attacks.
Let’s assume a user puts the following in an input field

"><script>alert("XSS Attack")

And you save it to your database.

If you now load this text from the db and use it to show it in the DOM of your website, the browser will execute the script. Of course here the script can do anything not only alert an attack :slight_smile:

The problem here is, that if you use the text to put it back in an input field (for example for editing) you should not encode it. But if you use it as plain text in the DOM you need to.

1 Like

understood and clarified - thanks

you mean htmlentities() yea ?

you mean display it in a browser yea ?

thanks for your effort and time - much appreciated

There are several methods to do this encoding. You can also encode it in your JavaScript code if the text is loaded with Ajax for example. So I used just the word encode but yes, you can also use htmlentities()

Same here. There are several ways to display text in your browser. And depending on the way it is done, the text must have other content to execute a script (I just gave one possible example)

1 Like

Excellen ! Thanks so much, cleared up so much misunderstanding and I learned a lot as well - cheers

Can I please ask an off topic question seeing your title is Mentor :grinning:. I asked a question about developing an html 5 pattern using regex but only getting replies about the PHP part - is there a better place or better way to ask please

Sorry i can’t help here much. I am not a specialist on regex. But RegEx is RegEx. It does not differ on HTML, JavaScript or PHP. So I think you have some kind of misunderstanding here

1 Like

But do you understand what its primary purpose is?

Sometimes we need to show HTML in a web page, such as the following.

<b>some text</b>

If we put that HTML in a web page without using HTML entities then the browser will format the HTML the same as if it was all the other HTML. So we can use HTML entities as in the following that the browser converts to the HTML we want shown.

&lt;b&gt;some text&lt;/b&gt;

Using HTML entities for security purposes is a secondary purpose. I suggest not trusting it for use for security.

Perhaps you need to use a HTML parser to audit for dangerous elements, especially links and scripts. However Cross Site Scripting Prevention - OWASP Cheat Sheet Series seems to have good advice.

Thank you for expanding. Most helpful. Cheers.