Cleaning post and get variables

Hi,
I am trying to sanitize or clean all post and get variables on every page of my site, would putting this at the top of every page do the trick?



filter_var_array($_POST, FILTER_SANITIZE_STRING);filter_var_array($_GET, FILTER_SANITIZE_STRING);
filter_var_array($_POST, FILTER_SANITIZE_STRING);filter_var_array($_POST, FILTER_SANITIZE_STRING);


Or what else would I need to do?

You should validate each variable separately to make sure that it contains a meaningful value for what the specific field is allowed to contain.

Just sanitizing the inputs will prevent injection attacks but will still allow meaningless junk to be entered.

Plus, escaping a value for HTML is different than escaping a value for SQL, which is different than escaping a value for JSON, which is different than escaping a value for an email header, which is different than escaping a value for… and so on.

The kind of solution you’re proposing resembles PHP’s old magic quotes. It’s not the best approach. The best approach is to make sure your template code HTML-escapes any value it uses, and that your database code SQL-escapes any value it uses, and so on.

I do this later on in the code, but I just want something at the top to sanitize and prevent injections. Is the way I presented in the OP a good way to do that or is there a better way to sanitize $_GET and $_POST variables by putting one or two lines at the top?

Think of it like bringing in your shopping bags. You might want to wash your clothes in the machine, your new flatware in the dishwasher, and your veggies in the sink. You can’t just throw everything through a sanitizer because they get cleaned in different ways.

Input for an email address gets validated differently than a phone number, etc. If you’re expecting HTML input, your code will strip out the tags. That would be bad on a forum like this for example.

Validating makes sanitizing unnecessary. You only need to sanitize data when retrieving it from storage (in case the storage got compromised) - user input never needs sanitizing because it should always be validated BEFORE you move it out of tainted fields such as $_GET and $_POST. Never create your own fields that get loaded with tainted data (that hadn’t yet been validated) because then you can lose track of which fields are tainted and which are not.

+1

We might be thinking of different meanings of “sanitize.” For example, “O’Neil” might be a validated name, but that doesn’t make it safe for SQL. “<code>” might be valid content (like this post), but that doesn’t make it safe for HTML templates. Even valid content still needs to be escaped.

Sanitizing is the stripping out of characters that are not valid for that particular type of data. The difference from validating is that validating reports an error if there are invalid characters instead of stripping them out. Escaping is converting characters that have special meanings when code and data are jumbled together (as happens when you write data into HTML but not with SQL where the code goes in the prepare statement and the data goes in the bind statement - and hence escaping is not needed).

So “O’Neil” doesn’t need to be escaped since there is nothing in it that will cause problems when it is included in HTML and in SQL where it would cause a problem it never needs to be included in the same statement as code.

“<code>” does not need to be escaped when written to the database either - but it does need to be escaped if you are writing it into a web page as HTML has no way to keep the data and code separate.

Input needs to be sanitized if it comes from an internal source (eg read from the database) and validated if it comes from the user.

Output needs to be escaped if the destination jumbles code and data together and the data can validly contain chaqracters that could be misinterpreted as code. It does not need to be escaped if it is not allowed to contain characters that can be confused with code but it is still useful to do so (defence in depth).

There is no point in escaping the data where it is kept separate from the code eg using separate prepare and bind statements for database calls so that no matter what the data contains it cannot be misinterpreted as code as it is in the data statement (bind) not the code statement (prepare).

Very true. I think an important takeaway from this is that even a validated field still needs to be escaped in accordance with whatever its destination might be.

Technically true. But probably a good idea to perform the escaping nonetheless.

I don’t want anything except for characters and numbers, basically those two lines at the top are to strip out quotation marks and any other possible injection. I do a preg_match later, but I want a catch-all if in case I forget something. I’m more concerned with people putting SQL injection queries into my URL and then me querying with $_GET.

But escaping is done at the point where the data is about to be jumbled with code in order to ensure it isn’t confused with code. If the data isn’t going to be jumbled with code then escaping it just breaks the data.

Escaping data used in a mysqli_query call where the data and code are jumbled together even where valid data can’t contain characters that need to be escaped is worthwhile.

Escaping data to be used in a mysqli_bind call is pointless and could even break the data. With this call each data field is passed to the call as a separate parameter and whatever the parameter contains is the value of the data - escaping it can insert unwanted characters into the data. There is no appropriate escape function or process to use for data being passed to a mysqli_bind call as there there are no characters that need to be or should be escaped.

As I said before - ALL $_POST and $_GET fields should be VALIDATED before you move the values out of those tainted arrays into other variables - otherwise your entire code is suspect because you can’t tell what variables are tainted and which are not.

Yeah. …? No argument there.

I think such a strong emphasis on validation might be a mistake in this context. For the purposes of getting meaningful data, validation is certainly a must. But for the purposes of security and avoiding injections – which was supposed to be the topic of this thread – validation doesn’t cut it. Even valid data still needs to be escaped as appropriate for its destination.

The question the OP was originally asking is whether it’s good to escape/sanitize everything at the beginning of every script. I think the consensus of everyone in this thread is that this kind of global escape/sanitize is not a good idea. Each value needs to be escaped (or binded) in accordance with whatever its destination might be.

Yeah, to me this emphasis is also too strong and I don’t really agree that all $_POST and $_GET fields should be validated. There are many cases where I don’t validate those values just because there is no point in doing so, in other words it doesn’t bring any practical benefits and so writing code to validate is a waste of time. For example, a page requires a numerical id in $_GET like this:

http://example.com/article?id=123

I could validate the id but I don’t really need to since I can simply cast it to integer and be done with it. Casting to integer in this case is sanitizing - it makes the variable safe because my code expects an integer. If someone sends ?id=some_garbage, then there is no harm at all, the value gets cast to 0 and later my code can handle 0 with no problem because it is a valid variable type (although it might not be an existing id, which is a completely different matter).

And I could come up with many other cases like this where validating GET and POST simply doesn’t bring any practical benefit. In fact, in practice I find that the variables that need to be validated are in the minority but this will depend on the application, of course. In certain cases even sanitizing is not necessary - for example, when a GET variable is used as a search criterion in the WHERE clause in SQL - in this case no harm can be done no matter what data is sent so I can use the GET variable directly as is (provided any proper escaping or binding is done).

So to sum up: sanitize yes, but each variable individually depending on its meaning, never in bulk as suggested in the first post here. Even if sanitizing in bulk seems like a nice shortcut it may backfire as the application grows and then changing it will require serious refactoring of code.

You are getting the purpose of validation mixed up with security. Validation is not for security - it is to provide error messages when the user inputs the wrong thing into a form.

Is the user inputting the 123 somewhere in the first place? If so then you ought to be validating that input and giving them a message if what they entered into the form isn’t a valid value.

If it isn’t user input then there is no need to validate it (since validation is only needed for user input).

Where the value is being passed from somewhere else that is not supposed to be entered by the user then the current module should be sanitizing it - as you are doing by casting it.

Validate all user input (so as to tell them if what they have entered is invalid). This is not for the purpose of security but has security as a side effect.
Sanitize all other input. This is for security - in case the data has been tampered with.
Do both of these before moving the values to local variables - that way you know the local variables are all untainted.
escape any variables that are being output jumbled with code using the appropriate escape function for the type of code you are mixing it with.
Don’t escape variables that are not being jumbled with code.

So to get back to the OPs question - if the values have been entered into a form by the user and passed from there to the script then each field needs to be validated according to what it is allowed to contain. If it was hard coded in the prior page then it needs to be sanitized according to what each individual field is allowed to contain.

There is no generic routine for sanitizing since ALL characters are valid somewhere. So the generic routine you could include looks like this:



You can’t use FILTER_SANITIZE_STRING because one of the inputs might be allowed to contain something which that sanitization would remove.

Felgall, you are very right in what you are saying but the OP wasn’t asking about user input but about all POST and GET variables. If you are distinguishing user input as in your previous post then all is correct. But previously you stated that “ALL $_POST and $_GET fields should be VALIDATED” and that is what I objected to. Yes, user input should be validated but this thread is about all POST and GET variables, not only user input.