How to disable Html in TextBox/Textarea

rashidr · May 16, 2011, 11:48pm

Some spammers are sending me links on daily basis through my website contact email form. I have also inserted ReCaptcha security check.

I don’t want to block IPs so I have decided to disable html in input fields of my contact form but I have no Idea how to do this.

Can anyone help me in writing this code or give me some helpful links. I have search google but I have not found any good results.

ralphm · May 17, 2011, 12:11am

You could set up a bunch of regular expression that checked for link code and abort the form. E.g. a whole string of things like

strstr($msg, "http")

Do you know how to use preg_match?

Cups · May 17, 2011, 12:11am

One way is to make it clear on your interface that you do not accept html tags, and be explicit about what you do allow in.

Then, filter out anything which does not match what you expect.

As an extreme filter out anything not a space, letter or number:


// rm all but Numbers, letters space and dash
$input = '0123?> Abc -_#';
$output = preg_replace('#[^0-9a-z- ]#', '', strtolower($input));
echo $output;
// 0123 abc -

Or look at using PHPs PHP: Filter - Manual

Despite Filtering still go on and escape the input based on where it is going next, maybe your database?

[fphp]mysql_real_escape_string[/fphp]

Or use PDO or mysqli’s prepared statements (preferably)

Finally escape the data when you get it out of your database for the next environment it is going to go to, e.g. a webpage.

[fphp]htmlentities[/fphp] and that family of escape mechanisms.

Filter Input, Escape Output (FIEO) - sleep a’nights.

rashidr · May 17, 2011, 12:19am

Thanks… This is helpful

eruna · May 17, 2011, 6:01pm

striptags removes html.

if(striptags($msg)!=$msg){
//message contains html
}

Cups · May 17, 2011, 7:37pm

It wont deal with this kind of stuff though.

HTML_purifier used to be all the rage a while ago, everytime someone mentioned filtering tags - anyone know if it is as potent and well thought of as it used to be?

eruna · May 18, 2011, 2:18pm

That’s interesting. What’s the best way to block this type of attack?

It seems especially problematic in public spaces where you need to enable users to insert html code. I don’t fully understand how this works, but it looks like the attack can be fully disguised. Blocking special characters would work, but there are many times when special characters can’t be blocked.

Is it possible to run the message through decoding operations, checking for malicious code between each conversion.

E

Cups · May 18, 2011, 6:29pm

@rashidr reading the title and question you originally posed again, I would take the view that if you stipulate ‘no html’ clearly enough on your interface - then you are justified in aborting the operation if you find just one single opening or closing tag.

That’s very easy to do. For real humans who make a mistake, and you would like to be kind to them you can also detect the inclusion of a < or > and pop up an alert in JS to warn them.

The bots will not of course run into this JS alert problem, hence your fallback position in all cases must remain:

//pseudo code
if( exists a > < or < or > ) die();

Then just escape the data properly as you store it.

Cups · May 18, 2011, 6:33pm

It seems especially problematic in public spaces where you need to enable users to insert html code. I don’t fully understand how this works, but it looks like the attack can be fully disguised. Blocking special characters would work, but there are many times when special characters can’t be blocked.

That was the particular task HTML_Purifier was designed for, although as I said, I am unsure about what has happened to it, or indeed if any of the PHP5 Filter classes now deals with this issue.

Many of attack vectors use attributes within the tags (ie <b id=“attack in here”>), not the tags themselves and as such this is a difficult thing to pull off on your own.

eruna · May 18, 2011, 7:10pm

Cups thank you! I just looked up HTML purifier and its awesome. I’m definitely going to use this.