Sanitising free-form text

I’m trying to find way to sanitise a message in a contact form which may basically contain just about any printable character. I have come close with a couple of statements, but not quite there.

filter_var($_POST['message'], FILTER_SANITIZE_STRING)
preg_replace('/[^A-za-z0-9 \'-]+/', '', $_POST['name'])

There doesn’t seem to be a sanitize string for it and I can’t find a way to specify any punctuation or white space character in regex.

Can anyone help relieve the head banging?

punctuation is a regex interpreted value, so you have to escape characters like . (dot) with backspaces. have a look at the documentation. whitespaces have their own special characters.

http://www.phpliveregex.com/ (at the bottom)

but you did not clearly specify what you want to escape and why (in terms of context).

You have to define “to sanitize” in the first place. Without a context this term makes no sense at all.

Given you want it to be safely put in HTML form’s input field, why not to simply run it through htmlspecialchars?

1 Like

@chorn many thanks for the link to the cheatsheet.

It’s not necessarily about escaping. If I have to list (and potentially escape) every possible character I could be here till Christmas. What I was thinking was to remove anything that is not a printable character.

I was thinking of sanitise in the same sense as FILTER_SANITIZE_STRING.

Why do I want to put it INTO a form’s input field? I’ve just got it OUT of a textarea!

@colshrapnel had a valid question - you need to know what is the purpose of your sanitization and how you are going to use these sanitized strings.

FILTER_SANITIZE_STRING is basically strip_tags() - in my opinion this doesn’t make sense in most cases because why would you want to strip letters inside < and > characters? When you escape data properly for display then these tags are harmless. Also, stripping or encoding ASCII values less than 32 is not really helpful because certain characters like newlines, tabs, etc. are often important for text layout even though they are not printable. I never use filter_var() because I find it useless for most common and reasonable use cases.

That’s why you need to first define what sanitization is supposed to do in this case and proceed from there.

For plain text fields some sanitization routines that might make sense are stripping leading and trailing whitespace, replacing multiple spaces with a single space, removing NULL characters (0) or perhaps other special characters you might find causing trouble, limiting maximum length.

Personally, for text fields in most cases I only do trim() and sometimes mb_substr() to limit the length.

2 Likes

Thanks @Lemon_Juice. I guess I was simply thinking that sanitising would remove all non-printing characters.

To make it strip, you have to add corresponding flags to the call, like

$message = filter_var($_POST['message'], FILTER_SANITIZE_STRING, FILTER_FLAG_STRIP_LOW);
                                                                 ^^^ here
1 Like

Thanks, colonel

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.