I get the person_id with $_GET and use is_numeric, strip_tags and htmlspecialchars. Is it enough? If not, what should I do to sanitize every possible security threat related to the url, including XSS?
I think the first Is_numeric() test is adequate, if there were any other characters the test would fail except maybe if there is a decimal point. Latter can be tested with is_integer().
If i send you the string â135imacheaterâ, did i send you a valid ID? Typecasting will say absolutely i did, i sent the ID 135. Checking the string will say NO.
Youâre right. There wouldnât be any injection or xss problems. So this would probably be better classified as a usability problem. If the user enters a bad value, they should be informed so they can fix their mistake. The system shouldnât try to continue on with a clearly bad value.
Personally Iâd prefer to just let the script continue without changing the value. That way when it comes to doing the SELECT FROM WHERE id = â1234abccâ it fails and the site provides the same response regardless of whether itâs a valid integer or not.
Thatâs the trade off - performance or security. The first rule of security is to never trust inputs. Even if users are not trying to break your code you need to assume that they have no knowledge of what the valid values for inputs are and may enter an invalid value at any time expecting it to be valid.
To ensure the security of your system, always assume input is invalid until you have confirmed that they are not.
Approximately 2/3 of any properly written system will be validation (only less if some of the validation processing is done via built in functions and you donât count the code in those functions).
What SEO issues and what duplicate content? You mean the same page being at http://example.com/?person_id=123 and http://example.com/?person_id=123abc and so on? If those semi-invalid urls are not present anywhere on the site then thereâs no problem. This kind of duplicate content âissueâ exists with almost every site - you can append any number of arbitrary parameters to most urls and they will point to the same page. Thereâs no substantial difference than http://example.com/?person_id=123&random=239dh87432.
If person_id comes from user input then proper validation would be a good approach. If itâs just part of urls of links on the site then Iâd say simply sanitizing by loose casting to int would be perfectly fine.
I like this approach, too, because itâs simple and in my opinion itâs no big deal whether we treat links with â1234abccâ as valid or invalid.
I have a database that stores age as an Int
My form has a text input with a label âAge:â
(IMHO this would be the wrong input choice, but for example purposes)
Will most users enter the expected â20â
How many will enter
âtwentyâ, â20 1/2â or â20 and a halfâ eg.?
In these cases a âthe correct format is ##â type of message would be a help for them.
Thought it might be worth saying that is is important to check the regex, php, your database, and the http data posted to the server are all using the same charset. Otherwise your analysisâs of the strings might not be accurate.
While itâs easy to prevent invalid urls appearing on the site, itâs difficult off the site. For example on forums when someone links to: a url in brackets ( http://example.com/?person_id=123) depending on how smart the forum software is, it may or may not include the bracket (or other punctuation mark) in the linked RL.
⌠in which case itâs a good thing to accept the numerical ID with some garbage appended because thereâs more likelihood people will reach the desired page You canât prevent generation of arbitrary urls off the site, anyway. And in the rare cases it gets out of hand thereâs the canonical url meta tag for search engines that is designed to prevent such problems.