Best practices for storing HTML in the database from arbitrary user input

Hi all, trying to gather some information on which are the best practices for allowing HTML to go into the database from arbitrary user input, bearing in mind XSS attacks or any other problems that might be encountered.

Regarding XSS attacks
a simple way I can think of is to produce a list of allowable tags excluding the ‘script’ tag and use the following to sanitize the content:

strip_tags($html, ['a', 'div', 'span','etc...']);

Then that doesn’t cover the case for ‘onclick’ attribute exploits… which you could remove from the content perhaps somewhat hesitantly using a regular expression? But I cannot think of a better way.

And there will be other exploits and pitfalls and better ways of doing things that I’m missing right now and that’s why I’m asking everyone here.

I very much appreciate your input,

This can help you to understand basic concept, I also suggest you to explore DOMDocument class so you can easily understand how this works.

Also you can use HTML Purifier to filter HTML.

1 Like

Kind of guessing that the best practice for safely storing arbitrary html input is to not store arbitrary html input.

If you have a choice then investigate some of the markdown processors out there. Users can still format their input and and make pretty pages, while at the same time not having to deal with the complexities of html. And you don’t have to worry about attacks and whatnot.

1 Like

Do you know of any good ones out there? Preferably LGPL licensed
Many thanks

I’ve never used a markdown parser but I always start my search with well known vendors. I’ve used a lot of PHPLeague packages and it seems they have a markdown parser available.

1 Like

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.