Validation of user submitted (tainted) data

@cpradio

specify string, etc.

See, this “etc.” is a problem. You still unable to give a complete definition. What your fellow developer have to do with this “etc.”? How to implement it in the code? What does “specify string” mean? That’s what “vague” and “inconsistent” mean. That’s why you helps noone telling them “to sanitize” something.

The method described in my answer is considered the only proper way in the entire world, but, as it turns out, totally unknown on Sitepoint.

@spaceshiptrooper you’re a funny guy :slight_smile:
You are sending a non-empty array to PHP and then wondering why PHP is telling you it’s non-empty.

You really need more education in programming. An empty result is still a valid result. So, if “If the data doesn’t exist in the database” it’s ok. It’s not an error. Your code won’t return an error. It will return no rows.

Please verify your statement before making them. I’ve no time explaining basic things you can see yourself.

@Lemon_Juice look, you’re confusing two matters: a validation, which is essentially optional measure (and being indeed “a matter of preference”) with a proper SQL formatting, which is strictly obligatory and should be never avoided.

That’s why I hate then someone its trying to introduce “validation” in the topic of the proper SQL formatting.

That’s why I told to @felgall that a case with 1 OR 1 = 1 is a bad example for validation. One shouldn’t “always validate” because it’s indeed a matter of preference. But for SQL formatting it’s different. And one should always use placeholders to substitute a variable in the query, no matter if it’s been validated or not.

Imagine there will be not a numeric ID but a post to this very discussion board in question. What your “validation” would do? Nothing. While prepared statement will do all right, as it’s a more general rule.

So, tell a noob to use prepared statements properly and you’ll make this SQL always safe. Tell them to use validation and you tell him actually nothing. Ok, you may give them a solution for one particular case but leave him helpless for the billions other cases.

But that’s a nonsense! For a database all characters are valid! Your database layer should be able to run a query with any character. While character removal will just spoil the data.

$_POST etc are tainted in that there is no way to tell in a properly coded program whether the value is valid or not

For the SQL interaction you don’t need to. All you need is to format your SQL properly, by means of preparing a query with placeholders and then executing it with actual variables. You don’t need no validation code for this, so there is nothing to be omitted.

Name:5623009-%$2$ -

you really think people have names like that?

age: ght

is that older or younger than od3 years?

All characters might be valid in a database but the individual fields will have their own validation requirements.

All fields should be validated to make sure their value is meaningful before any processing is done with them long before you output the result of that processing to the database.

For example if you need their year of birth to store in the database and the person has ght as their age then what will you put in the database as their year of birth?

Writing to a database is an output function. Validation/sanitising are mandatory input functions if you don’t want security holes in your code.

So YES you do need to or you are missing half your code and will soon be missing all your data

@felgall you are confusing a database with yourself. For a database Name:5623009-%$2$ - is all right. If you don’t like it personally, it’s a matter of your personal preference, but by no means it’s a rule for everyone.

You are confusing a business logic validation with a database interaction.

You have to understand that validation is optional. Worse yet, some data you just can’t validate at all. For your post here on this board it’s all right to allow any value, be it 1 OR 1=1 or 5623009-%$2$ or “Robert’;DROP TABLE users – '” or whatsoever. There could be billions validation options, you cannot define them all.

But for the SQL interaction the situation is different. For the regular literals you don’t have to use any validation - you have to use prepared statements with placeholders instead. Unlike validation that is vague or even sometimes inapplicable at all, using prepared statements is a straightforward and unambiguous simple rule that is indeed have to be used unconditionally.

I’ve been working with PHP and MySQL for a while and I am still always learning things (hopefully faster than I forget them :older_man:)

For whatever reason, I had thought that if a field was an Integer type, and an attempt to enter a non-integer happened, the query would fail.

Not so, it was pointed out to me that MySQL would “coerce” a string into the field! eg. empty string → 0

A lot can be done to help things along. eg.
instead of using a text input which would take a string, a drop-down select can sometimes be used.
intstead of a checkbox that might remain unchecked, a radio so one has to be checked.
Even toss in a bit of JavaScript to bring attention to problems before the form gets submitted.

Good Enough? Maybe for the honest majority, But as devs we are well aware that such measures can easily be got around if one wanted to.
Maybe for some things it isn’t as important.
Does it really matter if I enter “Pudding Lane” as my address into a “required” field because I don’t want to give it?
Does it matter if I change my purchase total from $99.99 to -$500.00 " lol cash please

So a lot often needs to be dealt with server-side. Maybe I don’t care so much that someone might enter a false address. But I might care very much that it not exceed 100 characters in length.

I imagine that for the most efficient control, regex should be used. eg.
instead of is_numeric($blah)
preg_match("/^[\d]{2,4}$/", $blah)

But that would likely be over-kill for something used to only SELECT a pages content. Page doesn’t exist? give them a default page. problem solved.

If I had mistakenly made an “age” field VARCHAR() expecting users to enter digits, I would have a problem if they entered “Twenty seven” instead of “27” Their fault? Nope, mine.

Credit Card numbers? No thanks, I don’t want to get any where near them, thank you very much.

No I am not - you did in the code you posted.

All references to $_POST fields ARE business logic validation. Those tainted fields should never get as far as any processing that the code does.

Database Interaction is either an output function (updating the database) in which case it comes well AFTER all the validation has been done and tainted fields disposed of (or at least ignored) OR it is an input function (reading from the database) where the values it returns to the code are tainted and really ought to be sanitized before use (just in case the database was tampered with (although this is one place where you can often get away with simply assuming that the database is secure for simple applications.

Anyway you are the one who jumbled fields requiring business logic validation ($_POST fields) .with your database interaction. I never get the two confused the way you have in the code you posted.

Will you now accept that you shouldn’t jumble the input to your business logic validation into your output database interactions or are you deliberately trying to convince that others should do that so that their code can be more easily hacked in possibly millions of different ways.

Stop mixing up the two and then accusing others of mixing them up when they tell you not to mix them.

@felgall ok, it seems you listen only to yourself, so, a last one.

By saying “you must validate” you’re helping no one. It’s just a general purpose rant that is impossible to define in practical terms.

For the case in the initial question it’s ok to bypass a validation - nothing bad will ever happen. Although you may add a validation, it is not essential. That is why I said that this case is rather a bad example for validation (but good example for the proper SQL formatting). Honestly, there is nothing to argue with, but you may, if you wish.

I’m not confusing these two matters - I was only pointing out that strict validation is not always necessary and often simple sanitization like casting to int is enough. Of course this step is not necessary for pure SQL safety if prepared statements are used and it’s separate (should be) from the action of making SQL secure. However, my comment was about the application as a whole - input data often must be validated/sanitized because the application may malfunction with corrupt data - even if all cases of SQL injection are prevented with prepared statements.

@colshrapnel, I see your point and I agree with it, no need to defend it so fiercly!

Untrue.

By saying that you are helping to make it easier for web sites to be hacked. Obviously you are a hacker trying to make it easier for yourself.

Validation/sanitizing ALL tainted data is the first thing taught in any basic security course.

Another thing all decent security classes teach is defense in depth - ie. validate/sanitize even when you don’t need to so as to prevent the possibility of amendments to the code creating security holes.

The only reason I am continuing to post on this since I know you wont listen is to try to help those reading the garbage you are writing to understand that by following your advice their site will be a lot less secure than it could be and make it easier for you to trash their site.

3 Likes

Thank you, I was going to reply to his response, but then I just remember the saying “Don’t feed the troll.” And also, this is what he said on his stackoverflow account.

“Never argue with stupid people, they will drag you down to their level and then beat you with experience.”
― Mark Twain

Anywho that was off-topic. I’m not going to reply to a troll who is pushing amature and nooby codes for people. This is why most legacy codes still exist today. He pushes for stuff like if($_POST), but not rowCount() when handling errors is probably the best thing you can do for visitors.

2 Likes

If I were looking at examples of coding by say 25 different developers for a job, any that hadn’t done any validation would go straight in the bin, likewise any that didn’t use prepared statements would also go into the bin.

@colshrapnel say you had a form and one of the fields was either a very long text field or a textarea field, would you validated it or let spammers post spam and whatever other crap they wanted in that field?

1 Like

@SpacePhoenix you split the initial topic to create a more generalized “Should we validate in general?”, and if you want my answer to this latter one - it’s positive. I hope you didn’t expect any other answer from me.

But for the particular question from the initial topic this argument is irrelevant. You know nothing of the particular case in the OP and, therefore, unable to offer any particular validation (beside type casting, which will be superfluous and useless for security). 1 OR 1=1 is a case for the proper SQL formatting but not validation. Yes, you may add a validation here. It will add no security and thus 1 OR 1=1 is a bad case for the validation example. That is all I said there and again - I see nothing to argue with.

Besides, I am rather against generalized sermons like “You have to always validate!!111” as I find them rather useless. You either have to write a multi-volume codex, listing all the validation cases including spam protection, type casting, user validation, the life, the universe and the everything. Or just refrain from commenting. To me, a comment have to be practical, usable and relevant.

Update. I think I managed to realize the source of confusion. In the mind of average PHP user, protection from SQL injections is strongly connected to the user input validation, which is, frankly, a bullshit. 1 OR 1=1 is an SQL injection issue and it has nothing to do with user submitted (tainted) data. Yes, it sounds a blaspheme, but if you manage to break through an old superstition you’ll see it’s true.

Using prepared statements will 99.9 take care of [quote=“colshrapnel, post:34, topic:214325”]
1 OR 1=1 is a case for the proper SQL formatting
[/quote]

So in my opinion you argument is mute. If it’s harmlessly in the database it doesn’t cause any harm. Developers can’t control what a user inputs, they can only control what is inputed (without changing or modifying the inputed data — validating) or securing the output to the screen so it reacts harmlessly to the website.

GIGO - Garbage In, Garbage Out

Until the code gets changed to add some extra processing prior to the database call where that code potentially crashes the server or exposes data that a hacker can use to do who knows what.

There is a rule in security called “Defense in Depth”. If adding the extra security measure does no harm then add it anyway. It may save you later when the code is modified.

In particular all tainted data eg $_POST should be untainted before it leaves the input processing (which means validating or sanitizing it and then moving it to a different field name.

99.9% is an awful result from a security viewpoint it means there is one value in a thousand that will harm your system. Most hackers can try 1000 alternatives in a few minutes.

1 Like

There is a saying, “too many cooks spoil the broth”. PHP once tried to “sanitize all tainted data before it leaves the input processing” using magic quotes. Thank god, they got to their senses and quit that nonsense.

I agree - what characters are allowed as valid data is field dependent and so there is no one size fits all sanitize. Even the 11 filters available at http://php.net/manual/en/filter.filters.sanitize.php only cover a small fraction of the possibilities that you should be applying.

Applying a wrong or incomplete sanitize filter is useless, Fortunately all the millions of others can be built using a regular expression.

1 Like

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.