$_GET security

Hi,

I have an url like the following : site.com/person/23098

I get the person_id with $_GET and use is_numeric, strip_tags and htmlspecialchars. Is it enough? If not, what should I do to sanitize every possible security threat related to the url, including XSS?

if(is_numeric($_GET['person_id'])) {
    $id = strip_tags(htmlspecialchars($_GET['person_id']));
}
echo $id;

Have you tested this using example strings?

Not tested, but I have a feeling that after htmlspecialchars changes <tag> to &lt;tag&rt; strip_tags won’t see it as a tag.

More to the point, are you planning on using the value or displaying it?

I imagine you are using person_id in a query and not doing a “Welcome person_id”

I think the first Is_numeric() test is adequate, if there were any other characters the test would fail except maybe if there is a decimal point. Latter can be tested with is_integer().

Maybe also test for a user name with is_alpha()

1 Like

for the ultraparanoid preg_match(~[1]+$~) - keep in mind that is_int will say an integer string is NOT an int, because it’s not of type int.

also [FPHP]ctype_digit[/FPHP] will work.


  1. 0-9 ↩︎

1 Like

For integers, none of those filters or checks are necessary and they really don’t add any value. Just typecast it.

$id = (integer)$_GET['person_id'];

And of course use prepared statements for anything being persisted to a database and escape everything being sent back to the browser.

Except they do.

If i send you the string “135imacheater”, did i send you a valid ID? Typecasting will say absolutely i did, i sent the ID 135. Checking the string will say NO.

1 Like

That’s assuming he has a further check going on. Of course you can also do this:

$id = (integer)$_GET['person_id'];
if ($id != $_GET['person_id']) { // An attack

Sort of depends on if you are actively looking for malformed data or just want to protect the site.

Does this cause any security problems? I can’t think of any.

You’re right. There wouldn’t be any injection or xss problems. So this would probably be better classified as a usability problem. If the user enters a bad value, they should be informed so they can fix their mistake. The system shouldn’t try to continue on with a clearly bad value.

Thanks for the replies, I’ll edit this post with a possible solution.

And add SEO issues due to duplicate content.

Personally I’d prefer to just let the script continue without changing the value. That way when it comes to doing the SELECT FROM WHERE id = ‘1234abcc’ it fails and the site provides the same response regardless of whether it’s a valid integer or not.

I like this approach. A friend mentioned that premature testing is detrimental to performance. Always assume input is correct.

…but never trust it. :wink:

2 Likes

That’s the trade off - performance or security. The first rule of security is to never trust inputs. Even if users are not trying to break your code you need to assume that they have no knowledge of what the valid values for inputs are and may enter an invalid value at any time expecting it to be valid.

To ensure the security of your system, always assume input is invalid until you have confirmed that they are not.

Approximately 2/3 of any properly written system will be validation (only less if some of the validation processing is done via built in functions and you don’t count the code in those functions).

1 Like

What SEO issues and what duplicate content? You mean the same page being at http://example.com/?person_id=123 and http://example.com/?person_id=123abc and so on? If those semi-invalid urls are not present anywhere on the site then there’s no problem. This kind of duplicate content “issue” exists with almost every site - you can append any number of arbitrary parameters to most urls and they will point to the same page. There’s no substantial difference than http://example.com/?person_id=123&random=239dh87432.

If person_id comes from user input then proper validation would be a good approach. If it’s just part of urls of links on the site then I’d say simply sanitizing by loose casting to int would be perfectly fine.

I like this approach, too, because it’s simple and in my opinion it’s no big deal whether we treat links with ‘1234abcc’ as valid or invalid.

And often quite innocently eg.

I have a database that stores age as an Int
My form has a text input with a label “Age:”
(IMHO this would be the wrong input choice, but for example purposes)

Will most users enter the expected “20”
How many will enter
“twenty”, “20 1/2” or “20 and a half” eg.?

In these cases a “the correct format is ##” type of message would be a help for them.

Thought it might be worth saying that is is important to check the regex, php, your database, and the http data posted to the server are all using the same charset. Otherwise your analysis’s of the strings might not be accurate.

1 Like

While it’s easy to prevent invalid urls appearing on the site, it’s difficult off the site. For example on forums when someone links to: a url in brackets ( http://example.com/?person_id=123) depending on how smart the forum software is, it may or may not include the bracket (or other punctuation mark) in the linked RL.

1 Like

… in which case it’s a good thing to accept the numerical ID with some garbage appended because there’s more likelihood people will reach the desired page :smiley: You can’t prevent generation of arbitrary urls off the site, anyway. And in the rare cases it gets out of hand there’s the canonical url meta tag for search engines that is designed to prevent such problems.

1 Like

That’s a good point. You’re right, that is better :slight_smile: