Should I take data out of $POST?

When processing form data to send an e-mail, is there any advantage to taking the data out of the post array and putting it into some other array? I have seen scripts that do this, but I don’t know why.

At the moment, my script cleans and tidies the data and keeps it in the post array for validation and the subsequent e-mail function, and the only point at which it’s put in another array is if it needs to be displayed in the form when an error is found. I just wrote it that way based on a script in a PHP cookbook.

I ignored the saying about not fixing what ain’t broke and experimented with creating a new array for it, but the data disappeared at some point after the validation stage, so it didn’t make it as far as the e-mail function.

So I’m just wondering whether I should pursue that way of doing things and fix my error, or not worry about it and keep it as is, i.e. apparently working fine.

Most developers, myself included, consider superglobals like GET and POST ‘read-only’. As we all know, you should validate and filter all data supplied by our users. When you filter the incoming data, you nearly always modify it in some way to make it safe - and we cannot do this with a ‘read-only’ source.

That said, even if it is ‘read-only’ (I… 99.999999% of the time use it that way. Very rare circumstance that i write to get or post), you dont necessarily need to pull it out of said array to make it safe - only if you need to reuse that safe variable. For example.


//Assume: $_POST['address'] = "12SQUIRLYDO\\OF";
$sql->query("INSERT INTO addresses SET address = '".$sql->real_escape_string($_POST['address'])."'");

(Yes it’s simplified, but it’s an example)

If I dont need the escaped $_POST[‘address’] again, there’s no real point in me assigning it’s value to another variable.

Well, as per my understanding, it’s always good to store the original posted data for different purpose. But at the same time you must have a safe data to process further.

Suppose $_POST(or $_GET) has some data, and you want to save that in DB. So in that case, you can use some array recursive function to escape the special characters. Now after filtering the global data, you can safely use that with your script. But again you’ve to validate the original data.

A little bit more descriptive, $_POST[‘pass’] (the password that may contain any character), you want to check either it’s blank or not, but you are using MD5 method and resulting $_POST[‘pass’] will have 32 characters long string after MD5 it. So actually you’ll need to validate the $_POST data and save $customPost data in DB.

$_POST[‘pass’] = ‘’; // posted blank by user/browser
$customePost[‘pass’] = ‘xxxxxxx’; // where “xxxxxxx” is MD5 value of BLANK string.

Now you can easily validate the $_POST[‘pass’] and after successful validation you can save $customePost[‘pass’] in your DB or you can use with your script safely.

There are lots of other scenario, just depending how conscious you are.

$_POST/$_GET is not read only.AnthonySterling is only considering that read only else you can re-assign any global variable in PHP.


if(isset($_POST['pass']) && !isempty($_POST['pass'])) {
$sql->query("INSERT INTO users SET pass = '".md5($_POST['pass'])."'");
}

probably a bad example choice, considering MD5’ing a string will render it safe anyway…

The idea (to me, at least), is simple. “Do I need to use this edited form more than once?” has the same answer as “Should I save this to another variable?”

Yes, i know. I should have said “even if you consider it read-only”.

Other than the read-only thing (I must admit I feel the same), another reason I never write to POST or GET directly is to keep myself from mixing up filtered data and user input. I usually put my filtered data in a different array so whenever I need to access the data I know it has been validated and filtered and is safe to use.

If you sanitize or validate the data in place then how would you tell somewhere else in the code whether it has or hasn’t been checked?

You should always move the data into a different field when validating or sanitizing so that you can easily tell that the data is safe.

Well, the most plain-English answer that I’m closest to understanding is the last one, so I’ll focus on that for now. Felgall; if the code is written and used in a certain order, why would I need to take it out of the post array? (is it an array, by the way? I’m not too confident with the jargon) I mean, it comes in, gets tidied up (htmlspecialchars and that kind of thing), then validated with some regular expressions, then put in an e-mail. The only stuff that can come in is what I expect (the script checks for input names that are not my own). Is there something that I’m missing here? (Probably is).

That’s the point at which it should be moving the values to separate fields since the $_POST array contains values that have NOT been checked yet by the PHP script (and you can’t rely on any JavaScript as that is purely there for your visitor’s benefit and a hacker will have it turned off so as to bypass any checks done there).

You should always assume that the $_POST values can contain anything at all. You move those values to your own fields or your own array when you actually check to make sure they contain something they are allowed to contain using PHP to do the check where your visitor can’t turn the check off.

Also you have your processing listed in the wrong order. htmlspecialchars is an output function to use immediately before writing a field into a web page - it has no place before or even in the validation process as if the data is going to be used anywhere other than outputting to a web page you are taking potentially valid input and converting it to garbage.

I’m not sure why you’ve got onto Javascript. Regarding htmlspecialchars; that’s how the original script was written and I’ve kept it on. I didn’t know that I was doing anything wrong there.

But I still don’t understand the difference between validating data in post and validating the same data in a different array of my own creation. The tidying and validation processes would be the same; the incoming data would be the same. How does transferring it wholesale to a new array make it any safer? Isn’t that like moving a bomb from one room to another? It doesn’t alter how you defuse it.

Not at all. With your bomb example it is like having two rooms - the first room contains packages that might or might not be bombs. As you check the packages and determine either that they are not bombs or defuse them if they are, you move them to the second room. You now know that everything in the second room is safe whereas if you left them in the first room you don’t know that everything is safe because you don’t know what has and hasn’t been checked.

You don’t transfer wholesale to the new array, you transfer the validated value to the new array as you validate the individual field. That way all subsequent references to that second array you know that the content has been validated just as with the bombs you’d know that because it is in the second room that it has been defused.

Certainly moving all the bombs from one room to another before you defuse them would not give you any benefit. You only get the benefit when you defuse each one as you move it.

Makes the assumption you’re going to use the value again.
If you are, then yes, move it out.
If you arnt, then all you’d do by moving it out is add to the symbol table and store data that you’re not going to use. Wasteful.

OK; I think that I can see how this would be useful if you don’t yet know what inputs there are and whether they are safe; or you may not know for sure whether the data are safe when they are used at some point later on, or by another script, or a database. But is there any advantage to this way of working if the script is simple and linear in how it works, with step conditional on the success of the last, and only valid inputs can be submitted?

To stick with the previous analogy, my script is checking every possible “suspect package” and it can only proceed to do anything with the packages if they are all deemed as safe as is reasonably possible. One single ticking bomb will cause the script to die, as would any extra package that tries to sneak through by being labelled “harmless” or something.

Edit: I should add, for those of you who referred to databases, that I don’t use a database at all.

I think this is a great question, and you have some great answers. Here is my take.

One reason you might want to retain the moniker say, $_POST[‘telephone’] is that if it is reused much further down the script, when you come back to look at it a few weeks later you may be asking yourself “just where the hell did that variable come from?”.

At least $_POST[‘telephone’] is very explicit.

I have noticed that if I refer to an incoming var just once or twice I will tend to leave it as $_POST[‘telephone’] and take the pain of writing that out. It is far more complicated to have to type that out than say, $telephone and that in itself can lead to errors - so should be taken into consideration.

When I notice a 3rd time I am writing it out, I tend to go back and assign it to a more readable variable name eg $telephone. That is very easy to do with a modern IDE.

This has the effect of making code easier to read, but has to be balanced with the “where did that variable come from?” issue mentioned above.

Another decider could be to do with the number of physical lines of code that separate the uses of the variable. If you can see its’ use and reuse inside a single scrolls’ worth of code - then maybe your eye can quickly spot


$telephone = $_POST['telephone'];

If I feel the need to check that the incoming var’s contents can be checked against my expectations (sticking with a telephone number, that is say, numbers, dashes and spaces only) and that fails then I advocate you “fail early” at the top of your script and either dump that data, or abort the operation depending on issues such as how critical is this piece of information, or to what degree did my user a) subvert my js checks on the client, and b) ignore my GUI hints about what constitutes an acceptable phone number.

Once deep into your script, the rule is to Escape Output - protect the next environment which is about to accept this variable - as you don’t use a db then you could be outputting to a webpage as part of a html stream - so use htmlentities or one of that family of escape mechanisms.

I have also seen code like this:


$filteredTelephone;
or
$cleanTelephone;

Which I find off-putting to read, but at least you know what it means.

It is a reasonable assumption that almost all field that you go to the trouble of validating are going to be used again for some purpose other than validating it and so will need to be copied to safe fields so you have their values saved for use after you finish validation. Any presumed wastefulness in creating extra fields in this instance is trivial compared to the huge improvement in security that storing the validated values gives. Any validation process that doesn’t save the validated values separately is almost guaranteed to have security holes that can be removed by saving the validated values into separate known to be safe fields. You can check for security holes without moving the safe values but it means a thorough examination of the entire script every time you change anything instead of being able to see immediately from the field names. Why add days of extra checking every time you change the script when creating a few extra fields guarantees the security. This is the reason why it was decided to do away with the register globals option, so that you can make these time savings in checking security.

The exception would be a field such as a CAPTCHA where failing validation for that field means that you want to exit the entire page and either go to an error page or back to the original form. In that instance there is no value to copy to a safe field after the validation as either the value is the expected one which you don’t need to refer to again or the script doesn’t finish the validation at all.

I think that the whole ‘copy’ thing comes down to your meaning of “validation” – if all you are doing is checking that it has a valid value (validation) then why do you need to make the extra copy?

But if you are forced reformatting it, stripping illegal values from it while still moving forward on using it, then copying to a new variable makes sense.

I’ve actually been sending $_POST to queries directly more and more every since I switched to doing everything with PDO and prepared queries… since pdo::execute handles your sanitization for you.

I’ve also stopped using $_GET altogether, since all my code now parses $_SERVER[‘REQUEST_URI’] directly instead, manually exploding my values.

@cups – good point about the longer names; it can be somewhat disconcerting, but at least it’s clear what’s going on. Personally I hate needlessly cryptic variable and field names… But I’m really big on saying what things ARE. This isn’t ROM basic where variables can only be one letter long, and we aren’t coding over 150 baud dialup… JHVH forbid we use more than three characters on a variable name.