You’re right. There wouldn’t be any injection or xss problems. So this would probably be better classified as a usability problem. If the user enters a bad value, they should be informed so they can fix their mistake. The system shouldn’t try to continue on with a clearly bad value.
Personally I’d prefer to just let the script continue without changing the value. That way when it comes to doing the SELECT FROM WHERE id = ‘1234abcc’ it fails and the site provides the same response regardless of whether it’s a valid integer or not.
That’s the trade off - performance or security. The first rule of security is to never trust inputs. Even if users are not trying to break your code you need to assume that they have no knowledge of what the valid values for inputs are and may enter an invalid value at any time expecting it to be valid.
To ensure the security of your system, always assume input is invalid until you have confirmed that they are not.
Approximately 2/3 of any properly written system will be validation (only less if some of the validation processing is done via built in functions and you don’t count the code in those functions).
If person_id comes from user input then proper validation would be a good approach. If it’s just part of urls of links on the site then I’d say simply sanitizing by loose casting to int would be perfectly fine.
I like this approach, too, because it’s simple and in my opinion it’s no big deal whether we treat links with ‘1234abcc’ as valid or invalid.
Thought it might be worth saying that is is important to check the regex, php, your database, and the http data posted to the server are all using the same charset. Otherwise your analysis’s of the strings might not be accurate.
While it’s easy to prevent invalid urls appearing on the site, it’s difficult off the site. For example on forums when someone links to: a url in brackets ( http://example.com/?person_id=123) depending on how smart the forum software is, it may or may not include the bracket (or other punctuation mark) in the linked RL.
… in which case it’s a good thing to accept the numerical ID with some garbage appended because there’s more likelihood people will reach the desired page You can’t prevent generation of arbitrary urls off the site, anyway. And in the rare cases it gets out of hand there’s the canonical url meta tag for search engines that is designed to prevent such problems.