As I learned from the PHP course here, anything that comes from the database and was entered by users should be filtered by htmlspecialchars before viewing on the page.
My question is, what about if I want to do some stylistic things in the blog post like <i> <b> <br> etc? When viewing the page the tags will be shown as plain text because they were converted to html entities!
What is the approach for doing blog post that contain html tags without risking my site’s security by not using htmlspecialchars?
There are different approaches to this issue. One is using a markup language that is parsed for display. An example can be bbcode often used in forums or the markup language on these forums. This is a good solution if the input is coming from untrusted sources like arbitrary web site users.
If the input is coming only from the web site owner - like blog posts entered in an admin panel - then you can use a rich text html editor and simply save the whole html code into the database and then output it directly without htmlspecialchars(). Very often the html editors (like TinyMCE) provide detailed settings to restrict certain html tags, attributes, etc. so you can tweak what is allowed in the html. This kind of safety restriction works well if you know the person will not intentionally try to post anything unsafe - a hacker might bypass the editor’s restrictions and save any code into the database.
You can also mix the approaches, for example, output html from the database without htmlspecialchars() but on input sanitize the content, for example allow only certain html tags and strip everything else…
Thank you very much for the detailed answer. I really needed this. However, I would like to know the methods that can be used to stripe some html tags as you have said at the end of the comment. Is it just substr and alike methods or there are more advanced functions. Thank you again
There’s also a ready made solution - HTML Purifier - I haven’t used it but it looks pretty comprehensive and just designed for this purpose.
I will look into DOMDocument even if it looks complicated, but maybe in larger projects.
I loved the HTML Purifier solution and I might use it later.
Thank you, I really appreciate your effort and valuable information
I understand now. Then the other way of inserting malicious js is by putting them in event attributes. Is that it? If I managed to make a function that removes only tag and any onevent attributes, will I be safe from xss attacks? Or there are other ways? I don’t want to HTML purifier if these are the only ways. I do care about perforamnce
Thank you and sorry for my endless questions.
I have searched and found a great website that explains xss https://excess-xss.com/ and there is the approach you told me the whitelisting that is safer than blacklisting.
Thank you sir. Then I think my solution of removing the scripts and onevents won’t work and I should use DOMDocument or HTML Purifier as @Lemon_Juice has said.
This is a very informative discussion, Thank you both
I will do, Thank you very much. I will try to discover html purifier and use is along with and an html WYSIWYG for post and comments in any further project and htmlspecialchars for those things that don’t need tags.