Using htmlspecialchars with blog post, can't use html tags like <br> <i> etc!

Hello,

As I learned from the PHP course here, anything that comes from the database and was entered by users should be filtered by htmlspecialchars before viewing on the page.

My question is, what about if I want to do some stylistic things in the blog post like <i> <b> <br> etc? When viewing the page the tags will be shown as plain text because they were converted to html entities!

What is the approach for doing blog post that contain html tags without risking my site’s security by not using htmlspecialchars?

I hope my question was clear thank you :slight_smile:

There are different approaches to this issue. One is using a markup language that is parsed for display. An example can be bbcode often used in forums or the markup language on these forums. This is a good solution if the input is coming from untrusted sources like arbitrary web site users.

If the input is coming only from the web site owner - like blog posts entered in an admin panel - then you can use a rich text html editor and simply save the whole html code into the database and then output it directly without htmlspecialchars(). Very often the html editors (like TinyMCE) provide detailed settings to restrict certain html tags, attributes, etc. so you can tweak what is allowed in the html. This kind of safety restriction works well if you know the person will not intentionally try to post anything unsafe - a hacker might bypass the editor’s restrictions and save any code into the database.

You can also mix the approaches, for example, output html from the database without htmlspecialchars() but on input sanitize the content, for example allow only certain html tags and strip everything else…

2 Likes

Thank you very much for the detailed answer. I really needed this. However, I would like to know the methods that can be used to stripe some html tags as you have said at the end of the comment. Is it just substr and alike methods or there are more advanced functions. Thank you again :slight_smile:

Hi malozaibi welcome to the forum

I don’t know if I’d call it advanced, but l think you’re looking for

http://php.net/manual/en/function.strip-tags.php

1 Like

That what I needed. It is considered an advanced function for a beginner like me.
Thank you very much :slight_smile:

Yes, strip_tags is the simplest method, however it won’t strip unwanted attributes so some malicious javascript can still be passed. A more fine-tuned way would be to use the standard DOMDocument class, which can read html and then modify/sanitize it.

There’s also a ready made solution - HTML Purifier - I haven’t used it but it looks pretty comprehensive and just designed for this purpose.

3 Likes

Wow, you are my hero. I didn’t now that malicious javascript can be passed in attributes! I search but didn’t find an example ( will appreciate one ).
I will look into DOMDocument even if it looks complicated, but maybe in larger projects.
I loved the HTML Purifier solution and I might use it later.
Thank you, I really appreciate your effort and valuable information :slight_smile:

Yes, attributes like onclick, onmouseover, etc. can contain any javascript code and strip_tags will not remove them. DOMDocument can be a bit difficult if you are just beginning, I think HTML Purifier might be easier because you will just need to follow the simple example from their web site and they say the default configuration is safe to use. But remember to use it once on input when saving, not when displaying every time, otherwise you may have performance problems.

1 Like

I understand now. Then the other way of inserting malicious js is by putting them in event attributes. Is that it? If I managed to make a function that removes only tag and any onevent attributes, will I be safe from xss attacks? Or there are other ways? I don’t want to HTML purifier if these are the only ways. I do care about perforamnce :slight_smile:
Thank you and sorry for my endless questions.

More than can be imagined.

IMHO it is better and easier to “whitelist” and only allow what you want than it is to “blacklist” and try to remove everything (that you think of) that you don’t want.

1 Like

I have searched and found a great website that explains xss https://excess-xss.com/ and there is the approach you told me the whitelisting that is safer than blacklisting.
Thank you sir. Then I think my solution of removing the scripts and onevents won’t work and I should use DOMDocument or HTML Purifier as @Lemon_Juice has said.

This is a very informative discussion, Thank you both :slight_smile:

Don’t care too much! Using whatever method when saving user input will not impact performance much. Just choose what is better and easier for you at this stage.

1 Like

I will do, Thank you very much. I will try to discover html purifier and use is along with and an html WYSIWYG for post and comments in any further project and htmlspecialchars for those things that don’t need tags. :slight_smile:

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.