HTML_purifier used to be all the rage a while ago, everytime someone mentioned filtering tags - anyone know if it is as potent and well thought of as it used to be?
That’s interesting. What’s the best way to block this type of attack?
It seems especially problematic in public spaces where you need to enable users to insert html code. I don’t fully understand how this works, but it looks like the attack can be fully disguised. Blocking special characters would work, but there are many times when special characters can’t be blocked.
Is it possible to run the message through decoding operations, checking for malicious code between each conversion.
@rashidr reading the title and question you originally posed again, I would take the view that if you stipulate ‘no html’ clearly enough on your interface - then you are justified in aborting the operation if you find just one single opening or closing tag.
That’s very easy to do. For real humans who make a mistake, and you would like to be kind to them you can also detect the inclusion of a < or > and pop up an alert in JS to warn them.
The bots will not of course run into this JS alert problem, hence your fallback position in all cases must remain:
//pseudo code
if( exists a > < or < or > ) die();
Then just escape the data properly as you store it.
It seems especially problematic in public spaces where you need to enable users to insert html code. I don’t fully understand how this works, but it looks like the attack can be fully disguised. Blocking special characters would work, but there are many times when special characters can’t be blocked.
That was the particular task HTML_Purifier was designed for, although as I said, I am unsure about what has happened to it, or indeed if any of the PHP5 Filter classes now deals with this issue.
Many of attack vectors use attributes within the tags (ie <b id=“attack in here”>), not the tags themselves and as such this is a difficult thing to pull off on your own.