Best "generic" way to sanitize $_GET?

I am working on securing any possible holes in a pretty basic script I have written - it’s a resource browser that accepts a URL variable via $_GET in order to display correct content. This value is not recorded in a database or used for any other purpose. I assume I should be sanitizing $_GET - if so, what is the best “generic” way to scrape off bad guy inputs before using the variable?

It depends on what you are going to use the $_GET value for. Just think about these:

  1. Can anything potentially harmful happen if the value is in invalid format? Contains certain characters? Or its numerical value is too big/too small/etc.? Either check for incorrect format and display an error message or sanitize the value by downgrading it to something harmless. Use preg_match(), is_numeric(), <, >, !=, check_date(), ctype_digit(), in_array(), trim(), str_replace(), preg_replace(), strtr(), (int), (float), etc. Be aware that it is possible you can get any raw binary garbage you might not expect and make sure than no garbage can put your application down.

  2. Can anything potentially harmful happen if the value is too long, too short or empty? Either check for length or truncate it to something sane. Use strlen(), substr(), mb_substr() and mb_strcut(), preg_match(), etc.

  3. When you output the value anywhere outside the php script remeber to escape it as necessary. For example, use htmlspecialchars() for sending to browser as (x)html, use mysqli_escape_string() or eqivalent for injecting strings into sql queries, (int) or (double) for injecting numbers (or use prepared statements), use json_encode() for sending data to javascript, urlencode() for url parameters, rawurlencode() for filenames in urls, etc.

Some people like filter input functions for dealing with input data. Personally, I don’t use them because they basically replicate what can be done with more common php functions so I can’t be bothered.

But whatever method you choose it’s very specific to the intended usage so it’s hard to answer this question in less general terms.

Presumably a generic GET field can potentially contain anything at all. That being the case there is nothing to sanitize it for.

It would be necessary to appropriately escape that field to resolve any conflicting characters when outputting the field somewhere but that would simply be to ensure that it isn’t confused as being part of the surrounding code and would depend on where you are outputting it to. Just how to escape the output would depend on where you are outputting to - for example, if you are outputting to HTML then use htmlspecialchars() when outputting the field into the web page.

Thinking about it, I suppose this particular $_GET variable doesn’t need any sanitizing. The logic is:


$param = $_GET['param'];

if ($param == 'suchnsuch') {
// execute a certain predefined query
}

// etc

else {
// do nothing
}

It would be better to use $_GET[‘param’] in the if statement directly and not create $param at all as then you don’t have a variable that hasn’t been sanitized but which is named as if it has been.