Code layout and variable questions

Thanks again Paul. Reading now. :slight_smile:

Reading this thread over a few times, I’m actually thinking about reworking a lot of my code.

I might completely remove all entries of the following at the top of all of my pages:


if (isset($_SESSION['username']) && (isset($_SESSION['userid'])) {
     $username = $_SESSION['username'];
     $user_id = $_SESSION['user_id'];
}

and just use $_SESSION[‘username/user_id’] whenever I need to input or output the username/user_id.

I figure, why bother setting them as variables unless I need them to be, ie to prepare them for database insertion, for example:


$username = mysqli_real_escape_string($link, $_SESSION['username']);
$result = mysqli_query("INSERT INTO database (one, two, three) VALUES ('1', '2', '3') WHERE username = '$username'");

But for any stuff that is just output on my page (welcome messages, menu links, etc) I will just do this:

echo 'Welcome' . htmlspecialchars($_SESSION['username']) . '.';
// a user navigation link
<?php echo '<a href="blah.php?user=' . htmlspecialchars($_SESSION['username']) . '">' ?>Blah</a>

Sorry for rambling on and on, but this thread got me thinking about how I was writing my code in certain ways… and it seems a bit pointless/redundant to do it the way I am currently doing it. Especially if I am going to be re-setting $username throughout my code with different escape functions anyway, why bother setting it at a point where I’m not even using it yet. Unless there is any reason I shouldn’t use the session variables straight up like that?

The main reason is in terms of encapsulation.

It’s best if you don’t have global variables scattered all throughout your code. Not only is it difficult to keep track of what you’ve used and where, it also becomes a real pain in the butt if you have to change one of those global variable names.

See for example this article about PHP globals in functions

To help resolve those issues, assign the global variable to a local variable so that there are less instances of global variables scattered about the place.

Another reason is something called code smells (wikipedia), which are indications of deeper problems. There’s quite a comprehensive list of them over at the [url=“http://www.codinghorror.com/blog/2006/05/code-smells.html”]code smells (Coding Horror) page.

In this case, superglobal variables should not be trusted because you cannot guarantee what they contain. Yes it’s possible to sanitize and validate them and rewrite the clean version back in to the superglobal, but then when viewing the code later, on you won’t know if the superglobal is clean or not.

Due to the dangerous potential of superglobals, any time you see them in your code you should ensure that they are cleaned and assigned to local variables, and those problems go away. If they appear without being appropriately handled, that should fire off some warning bells to ensure that they are.

Appropriate handling of them with PHP 5.2 or better


$username = filter_input(INPUT_GET, 'username', FILTER_SANITIZE_STRING);

or:


$username = filter_var($_SESSION['username'], FILTER_SANITIZE_STRING);

And before PHP 5.2


$username = '';
if (isset($_SESSION['username'])) {
    $username = $_SESSION['username'];
}

And if your server still has magic quotes enabled, you should follow the advice from this Disabling Magic Quotes page.

Understood. Aborting that change and making sure everything that I do have is handled as stated. Thank you again!

Final two questions, I swear! I want to understand, not just ‘do’ it.

1 - FILTER_SANITIZE_STRING is a combination of strip_tags and htmlspecialchars?

2 - Since I am going to be setting my session variables at the top of my page like so:


if (isset($_SESSION['username'])) {
     $username = filter_var($_SESSION['username'], FILTER_SANITIZE_STRING);
}

I can use that directly in my navigation links (in the same page) as just plain echo $username, no need to escape, because it is already escaped every time the page is loaded?

Unless someone posts the form that is on my page, then $username = mysqli_real_escape_string($link, $username); will be set for the database query, then when the page reloads, my filter_var is applied again… right ?

I am assuming that as long as the form query isn’t sent that $username will always be safe for html output in the page whenever the page loads, since it is set at the top. And it will only change to be safe for database insertion when it is called to be (eg, upon someone pressing the form submit button).

I think I understand. Gulp. I hope so! I will be fine after these two questions I promise!

Here are the types of filters. The [url=“http://www.php.net/manual/en/filter.filters.sanitize.php”]sanitize filters is where you find FILTER_SANITIZE_STRING, where it says:

Strip tags, optionally strip or encode special characters.

The optional flags on the page can be specified in the third parameter as an associated array. You can see examples of it on the filter_var and [url=“http://nz.php.net/manual/en/function.filter-input.php”]filter_input

It might be escaped based on when you set the value earlier, but it’s a bad practice to escape anything before you actually need to. Why? Because of the danger of double-escaping or triple escaping. Have you ever seen “O\\\'Brian” as a database value?

You should not escape when values come in to your code.
Sanitize them, validate them even, but don’t escape them. Why? What happens when you want to combine an escaped string with an unescaped one? That just leads to problems.

Also, escaping is different depending on where the value will go. Is it going to be an SQL statement, an HTML piece of text, some XML, a URL, or destined for email?

Values throughout your code should be unescaped up to and until they leave your code, as output to their intended destination. Only at the last possible moment should they then be escaped, as is appropriate for where they’re going to go.

Ok, hopefully I can clear this up today. I thought I was escaping everything the right way, so let’s make sure I learn how before I continue.

First let me post some example code and then I will ask questions based on that. I had pretty much every page on my site done with this type of escape routine:


<?php
session_start();
if (!isset($_SESSION['username'])) {
		header('Location: /website/login.php');
}
if (isset($_SESSION['username']) && isset($_SESSION['user_id'])) {
	$username =htmlspecialchars($_SESSION['username']);
	$user_id = htmlspecialchars($_SESSION['user_id']);
}
require_once $_SERVER['DOCUMENT_ROOT'] . '/includes/database_connection.php';
require_once $_SERVER['DOCUMENT_ROOT'] . '/includes/functions.php';
if(isset($_POST['item_submit'])) {
	$username = mysqli_real_escape_string($link, $_SESSION['username']);
	$user_id = mysqli_real_escape_string($link, $_SESSION['user_id']);
	$item_name = mysqli_real_escape_string($link, $_POST['item_name']);
	$sql = mysqli_query($link, "INSERT INTO items (user_id, item_name) VALUES ('$user_id', '$item_name')");
		if (!$sql) {
			$error = 'DATABASE ERROR.';
			include 'error.php';
			exit();
		}
		else
		header('Location: /website/items.php?user=' . htmlspecialchars($username) . '');
}
?>
<html>
	<ul>
		<li><?php echo '<a href="navigation_link_one.php?user=' . htmlspecialchars($username) . '">' ?>navigation_link_one</a></li>
		<li><?php echo '<a href="navigation_link_two.php?user=' . htmlspecialchars($username) . '">' ?>navigation_link_two</a></li>
	</ul>
			<form action="add_item.php" method="post">
			
				<label for="itemname">Item Name:</Label>
				<input type="textbox" name="item_name"/>
				
				<input type="submit" name="item_submit" value="Add Item"/>
			</form>
</html>

There are probably some code mistakes as I typed this up purely for examples sake, based on what I have been doing throughout my website. I removed a lot of tags that don’t need to be seen etc.

Hopefully this will teach me a great lesson in how to set up my variables/escaping properly.

Now let me break break it down into sections.

I’m assuming, that since the $username variable is automatically htmlspecialchar’d on every page due to my navigation code, that when the user submits a form and $username gets mysqli_special_escape_string put on it - THAT is ‘double escaping’?

Not to mention that since $username is automatically done on every page load in the navigation, it gets done a THIRD time when the user submits an item and they are redirected:


header('Location: /website/items.php?user=' . htmlspecialchars($username) . '');

Is that assumption correct?

All the above said, I should (?) be using filter_var on the following instead anyway. But that would still leave me with some ‘double escaping’ in the rest of the code, if my above assumptions are correct. I am also thinking that filter_var is for sanitizing, not escaping for direct output, and I could be wrong there too.


if (isset($_SESSION['username']) && isset($_SESSION['user_id'])) {
	$username = filter_var($_SESSION['username'], FILTER_SANITIZE_STRING);
	$user_id = filter_var($_SESSION['user_id'], FILTER_SANITIZE_STRING);
}

Sorry if that is all annoying to read, I just crawled out of bed after a long night of no sleep (baby).

Ugh, I’m not even using $username in that sql query, but my questions can be applied to $user_id still, and the fact that I am escaping $username multiple times elsewhere anyway.

No sleep no brain.

Here is what I came up with and I think it is right except for one double escape still…


<?php
session_start();
if (!isset($_SESSION['username'])) {
		header('Location: /website/login.php');
}
if (isset($_SESSION['username']) && isset($_SESSION['userid'])) {
	$username = filter_var($_SESSION['username'], FILTER_SANITIZE_STRING);
	$userid = filter_var($_SESSION['userid'], FILTER_SANITIZE_STRING);
}
require_once $_SERVER['DOCUMENT_ROOT'] . '/includes/database_connection.php';
require_once $_SERVER['DOCUMENT_ROOT'] . '/includes/functions.php';
if(isset($_POST['item_submit'])) {
	$user_id = mysqli_real_escape_string($link, $_SESSION['user_id']);
	$item_name = mysqli_real_escape_string($link, $_POST['item_name']);
	$sql = mysqli_query($link, "INSERT INTO items (user_id, item_name) VALUES ('$user_id', '$item_name')");
		if (!$sql) {
			$error = 'DATABASE ERROR.';
			include 'error.php';
			exit();
		}
		else
		header('Location: /website/items.php?user=' . htmlspecialchars($username) . '');
}
?>
<html>
	<ul>
		<li><?php echo '<a href="navigation_link_one.php?user=' . htmlspecialchars($username) . '">' ?>navigation_link_one</a></li>
		<li><?php echo '<a href="navigation_link_two.php?user=' . htmlspecialchars($username) . '">' ?>navigation_link_two</a></li>
	</ul>
			<form action="add_item.php" method="post">
			
				<label for="itemname">Item Name:</Label>
				<input type="textbox" name="item_name"/>
				
				<input type="submit" name="item_submit" value="Add Item"/>
			</form>
</html>

But I feel like the fact that my navigation escaping $username first, and then my redirect (upon form submission) escaping it again, is double escaping?

Should have just posted this, much simpler… hehe

Is the page index.php which is the start of your website, also in this directory?

$_SERVER[‘DOCUMENT_ROOT’]

doc root is /x/

website files are in /x/website/

includes are in /x/includes/

need to sort out protecting my includes someday, am hoping to sort out this escaping issue (assuming there is one) first

note: when site is tested live doc root is /, main website files are in /, and includes are in /includes. hence the need to protect them… but that is a job for later.

Not as such. htmlspecialchars converts a small subset of characters in to their html entity character form:

[list][]& => &
[
]" => "
[]< => <
[
]> => &rt;[/list]

Double escaping commonly occurs in three different places.

First, when the PHP script retrieves values from $_POST and $_GET. If you have magic quotes enabled on your server, O’Brian will be turned in to O\'Brian

It was soon found though that magic quotes weren’t effective enough, so then came mysql_escape_string, which didn’t do a proper job, so we now use mysql_real_escape_string

The second time when people mistakenly escape values is when assign those $_POST and $_GET values to variables. Some people mistakenly perform mysql_real_escape_string at that point, which turns O\'Brian in to O\\\'Brian

Third is when sending the SQL query, where they may yet again escape the values.

How do you deal with magic quotes? If they are enabled, turn them off at the server. If you can’t turn them off at the server, you can filter them at run-time, but that’s at a performance cost.
See this PHP.net article on disabling magic quotes

The way that I like to see it is that when the variables are brought in to your code, you should ensure that they end up being naked and unprotected while running around in your code. Why? Because often one value may end up used for three different types of output (database, url, html), and it’s not possible to initially escape them so that they’re suitably for all types. You could create several copies of the variable, one for each intended usage of it, but that get messy very quickly. So, have the variables unescaped throughout your code and escape them when they’re being outputted. That way, any output that is not escaped will instantaneously be a code-smell that you can easily perceive.

With the header location example above, instead of htmlspecialchars, I think that urlencode() is the more proper one to use instead.

Using urlencode also helps to remind you about the intended usage of the code. If you had a variable called $location, it would be entirely possible to mistake the intended usage of that variable.


$location = '/website/items.php?user=' . htmlspecialchars($username);

When you use urlencode, there’s never a miscommunication.


$location = '/website/items.php?user=' . urlencode($username);

You will even see the urlencode documentation page using both urlencode and htmlentities at the same time, because they each serve different purposes.

If need be you can var_dump variables while testing to ensure that they are being handled properly. The purpose of escaping a url string is so that it can be safely submitted to load a new page. I don’t think that it ends up being received by the page as escaped code. However, if you have magic quotes enabled then that might escape things for you instead, which you need to deal with so that the variable can then be handled more correctly with techniques that do a better job.

You’re not wrong there.

Although, if user_id is supposed to be a number, or even an integer, you may want to use a different filter for that one, such as FILTER_SANITIZE_NUMBER_INT from the list of [url=“http://www.php.net/manual/en/filter.filters.sanitize.php”]sanitize filters

Basically it seems to come down to two basic fundamentals:

[list][]Sanity check all inputs, especially if they’re potentially unknown (therefore untrusted)
[
]Escape all outputs
[/list]

It just seems to be common sense.

Putting it in to practice does have differences depending on where it’s coming from, or where it’s going to, but that’s why there are good standards that can be followed to help reduce the problems that can occur.

Paul. There are not enough thanks in the world.

I don’t have magic quotes enabled thankfully, so I don’t have to deal with that.

And you know, I never actually thought about htmlspecialchars being unsuitable for my header redirect… it’s not html output, duh!! Thanks for the heads up there.

Here is the final code I have come up with, I think I finally have it right and can stop bugging you. (Oh please let it be right! :D)

<?php
session_start();
if (!isset($_SESSION['username'])) {
        header('Location: /website/login.php');
}

if (isset($_SESSION['username']) && isset($_SESSION['userid'])) {
    $username = filter_var($_SESSION['username'], FILTER_SANITIZE_STRING);
	$userid = filter_var($_SESSION['userid'], FILTER_SANITIZE_NUMBER_INT);
}

require_once $_SERVER['DOCUMENT_ROOT'] . '/includes/database_connection.php';
require_once $_SERVER['DOCUMENT_ROOT'] . '/includes/functions.php';

if (isset($_POST['item_submit'])) {
	$user_id = mysqli_real_escape_string($link, $_SESSION['user_id']);
	$item_name = mysqli_real_escape_string($link, $_POST['item_name']);
		$sql = mysqli_query($link, "INSERT INTO items (user_id, item_name) VALUES ('$user_id', '$item_name')");
			if (!$sql) {
				$error = 'DATABASE ERROR.';
				include 'error.php';
				exit();
			}
			else
			header('Location: /website/items.php?user=' . urlencode($username) . '');
}
?>

<html>
    <ul>
        <li><?php echo '<a href="navigation_link_one.php?user=' . htmlspecialchars($username) . '">' ?>navigation_link_one</a></li>
        <li><?php echo '<a href="navigation_link_two.php?user=' . htmlspecialchars($username) . '">' ?>navigation_link_two</a></li>
    </ul>
            <form action="add_item.php" method="post">
                <label for="itemname">Item Name:</Label>
                <input type="textbox" name="item_name"/>
                <input type="submit" name="item_submit" value="Add Item"/>
            </form>
</html>

There are several minor things, so what say I do a small code review right here for you.

If Statements


if ([color="blue"]!isset($_SESSION['username'][/color])) {
        header('Location: /website/login.php');
}

if ([color="blue"]isset($_SESSION['username'])[/color] && isset($_SESSION['userid'])) {
    $username = filter_var($_SESSION['username'], FILTER_SANITIZE_STRING);
    $userid = filter_var($_SESSION['userid'], FILTER_SANITIZE_NUMBER_INT);
}

The first significant thing with these two here is that you have the same term that is inverted in one if statement, and not inverted in the other.
These two if statement can be combined together, so that you don’t have to think so hard to realise that they are actually related.

if (!isset($_SESSION['username'])) {
        header('Location: /website/login.php');
} [color="green"]else if ([/color]isset($_SESSION['userid'])) {
    $username = [COLOR="Blue"]filter_var($_SESSION['username'], FILTER_SANITIZE_STRING)[/COLOR];
    $userid = filter_var($_SESSION['userid'], FILTER_SANITIZE_NUMBER_INT);
}

Even better though is to realise that filter_var gives you an empty value if the variable doesn’t exist, so you could do this instead:

$username = filter_var($_SESSION['username'], FILTER_SANITIZE_STRING);
if (empty($username)) {
    header('Location: /website/login.php');
}
$userid = filter_var($_SESSION['userid'], FILTER_SANITIZE_NUMBER_INT);

Database Variables

Next up is the start of the database condition:

if (isset($_POST['item_submit'])) { 
    [COLOR="Blue"]$user_id[/COLOR] = mysqli_real_escape_string($link, $_SESSION['user_id']); 
    $item_name = mysqli_real_escape_string($link, $_POST['item_name']); 

The first problem here is that $user_id already exists, having retrieved the value via filter_var. Now it’s replaced with a completely different value.

Move those variable assignments up top (and remove the mysqli_real_escape_string for now) and that solves several other problems too.


$item_submit = isset($_POST['item_submit']);
$userid = filter_var($_SESSION['userid'], FILTER_SANITIZE_NUMBER_INT);
$item_name = filter_input(INPUT_POST, 'item_name', FILTER_SANITISE_STRING);
...
if ($item_submit) { 

What are those other problems if they had been left where they were?

One of those other problem is that superglobals such as $_SESSION and $_POST are used within that database section. Those should have been handled earlier on, so that you don’t have superglobals scattered all throughout your code.

Getting that done before performing the database work is one of those invisible demarcation barriers. It’s a “separation of duties” that helps to simplify your code, and make it easier to modify and understand later on.

SQL Statement

Next we get to the $sql statement.

A fundamental flaw in how the mysqli_real_escape_string was used before is that whenever you assign the escaped value to variables, those variables have a flow-on effect long after the database part of the code has been done.

While in this particular case it doesn’t have a direct impact, if you had also escaped $username to send to your database, you would then find when outputting it in the HTML section that the $username variable is already escaped.

The solution to this is to ensure that escaped variables do not persist beyond the point where they are used.

One common way to do this is to encapsulate things within a function, so that the escaped values do not persist once the function is complete.
Another way that you could immediately use here, is to escape only when creating the sql string.

But first, let’s look at the SQL string.


$sql = mysqli_query($link, "INSERT INTO items (user_id, item_name) VALUES ('$user_id', '$item_name')");
    if (!$sql) {

Instead of $sql being the SQL string, it is instead the result of the database call. Most commonly the $result variable is used instead to store the result of the database call, so let’s separate out the SQL statement from the result.


$sql = "INSERT INTO items (user_id, item_name) VALUES ('$user_id', '$item_name')";
$result = mysqli_query($link, $sql);
    if (!$result) {

Now we just need to ensure that the variables being entered in to the SQL statement are properly escaped. Commonly sprintf() is used to help reduce the complexities of that.


$sql = sprintf('INSERT INTO items (user_id, item_name) VALUES (%d, "%s")',
    intval($userid),
    mysql_real_escape_string($item_name)
);

Notice how $userid is converted to an integer? In the SQL statement, %d is always a number (a decimal value) so you don’t need to look any further to know that the code is correct. Further on, we use intval to ensure that whatever $userid happens to be, that it can now be nothing but an integer number. Even though we sanitised earlier on, other things can happen to that variable throughout the code, so it’s a good practice to always escape (or cast) in some way the SQL variables. It’s a final safety net.

It’s either that or use prepared statements, which many people don’t want to deal with.

The End Result

So what are we left with after all that?

There are some other minor issues remaining such as variable name consistency, but I don’t want to go tweaking things too much until you can see a clearly defined reason to do so.

For the sake of succinctness I have left off the HTML code below the PHP script as there’s no change to that part. Even though the script still looks mostly similar to what you had, hopefully you can see the benefits now of the improvements that have been made.


<?php
session_start();
$username = filter_var($_SESSION['username'], FILTER_SANITIZE_STRING);
if (empty($username)) {
    header('Location: /website/login.php');
}

$item_submit = isset($_POST['item_submit']);
$userid = filter_var($_SESSION['userid'], FILTER_SANITIZE_NUMBER_INT);
$item_name = filter_input(INPUT_POST, 'item_name', FILTER_SANITISE_STRING);

require_once $_SERVER['DOCUMENT_ROOT'] . '/includes/database_connection.php';
require_once $_SERVER['DOCUMENT_ROOT'] . '/includes/functions.php';

if ($item_submit) {
    $sql = sprintf('INSERT INTO items (user_id, item_name) VALUES (%d, "%s")',
        intval($userid),
        mysql_real_escape_string($item_name)
    );
    $result = mysqli_query($link, $sql);
        if (!$result) {
            $error = 'DATABASE ERROR.';
            include 'error.php';
            exit();
        } else {
            header('Location: /website/items.php?user=' . urlencode($username));
        }
    }
}
?>

Or, if you take the function path:


<?php
session_start();

function insertItem($link, $userid, $item_name)
{
    $userid = intval($userid);
    $item_name = mysql_real_escape_string($item_name);

    $sql = 'INSERT INTO items (user_id, item_name) VALUES (' . $userid . ', "' . $item_name . '")';
    return mysqli_query($link, $sql);
}

$username = filter_var($_SESSION['username'], FILTER_SANITIZE_STRING);
if (empty($username)) {
    header('Location: /website/login.php');
}

$item_submit = isset($_POST['item_submit']);
$userid = filter_var($_SESSION['userid'], FILTER_SANITIZE_NUMBER_INT);
$item_name = filter_input(INPUT_POST, 'item_name', FILTER_SANITISE_STRING);

require_once $_SERVER['DOCUMENT_ROOT'] . '/includes/database_connection.php';
require_once $_SERVER['DOCUMENT_ROOT'] . '/includes/functions.php';

if ($item_submit) {
    if (insertItem($link, $userid, $item_name)) {
        header('Location: /website/items.php?user=' . urlencode($username));
    } else {
        $error = 'DATABASE ERROR.';
        include 'error.php';
        exit();
    }
}
?>

Do you guys use the Hungarian notation for variables?

I do use a simple form of it - prefixing variable names with 3 letters depending on its type (str, int, dec, arr, obj etc). I know it’s not a popular practice and others on here may disagree, but I find it quite beneficial and have found no downsides to it so far.

I think I actually get the hang of it now, and I re-did my code in the style of your example. Not just copy paste, but did it piece by piece while reading your post, which I have probably read near fifty times already last night and today… no exaggeration.

I am very eager to learn not only the RIGHT way to do things, but to know why.

I do have two more things I am wondering about. And these will be the last things for now. I feel privileged enough to have the amazingly detailed help you have given me, but anything further and I feel like I should start paying you. :blush:

Firstly, I am wondering (and you have probably said so but it flew over my head) what the point is of the first set of filtering, when those variables aren’t in use at that time and you end up having to escape them differently in the query anyway?

My guess: The initial filtering is simply an extra line of defense. And the filters that are applied stick with those variables, and mysqli_real_escape_string is an ADDITIONAL measure of safety on TOP of that filtering. Would this be right?

Secondly, I have never used sprintf before so that query format was/is rather foreign to me, but I think (due to your explanations) that I understand it.

If I do understand it, this:

(%d, "%s")

Just means that the variables that are going to be in those positions, are going to be of the designated %type. And then AFTER that, you state what those variables are and escape them, in the same order as the %values are listed.

I am very used to just setting my variables with escapes and then inserting them directly, like in my final example before your epic post. I am guessing from what you have told me, that is when the variables ‘persist’ and is not a good thing.

I thank you deeply, your explanations are worded absolutely brilliantly, clearly, and with great detail. Even if I don’t entirely get all of it, your wording suits the way that my brain works.

You should write a book or do a video series - I would purchase in an instant. :wink:

Let’s start from mysqli_real_escape_string. The purpose of that function is to protect the database from many things that can cause it trouble. The mysqli_real_escape_string documentation page says that it includes strange characters such as null or Ctrl-Z, but also normal string-based characters such as newlines, quotes, double quotes, and slashes.

Those string-based characters are not filtered or escaped in any way, so it’s important to use mysqli_real_escape_string to protect the database.

Let us now go back to the top and look at the initial filtering. That is the place where all of the inputs come in, so that they can then be used appropriately by the rest of the code. Some of them are used by the database, and others aren’t, so we have three different possible scenarios that can take place here.

  1. Process all of the inputs
  2. Process some of the inputs
  3. Process none of the inputs

Clearly the last option is not suitable, because then the non-database parts of your code will still be at risk.

So let’s look at processing some of the inputs. If we do that, with some of the inputs being processed and others being left alone, we end up with something like this:


$username = filter_var($_SESSION['username'], FILTER_SANITIZE_STRING);
...
$item_submit = isset($_POST['item_submit']);
$userid = $_SESSION['userid'];
$item_name = $_POST['item_name'];

The first thing that stands out to someone newly looking at this code should be the question: why is username being filtered but the others aren’t? Perhaps with enough investigation they can find out that the other variables aren’t being used where they would cause a problem, but that can take quite some time.

What happens now if someone decides that they want to display the item that was added, and they update the HTML part of the code? If you’re lucky they will use htmlspecialchars and you might see tags on the screen if they’re included in the string. If you’re not lucky and they forget to use htmlspecialchars, Boom! You are now open to XSS scripting attacks. One of the beneficial things that FILTER_SANITIZE_STRING performs is to strip tags from the string.

So, in order to protect from such an easily made mistake (I’ve done it several times myself) that’s why we go with option 1. Sanity check all of your inputs, as a mandatory protection measure for all of your code.

It’s even more flexible than that. If you give it a number and a string in the wrong order, it will place them in the right order for you. You can see it being used quite liberally on the mysql_query page, with full details about it at the [url=“http://php.net/manual/en/function.sprintf.php”]sprintf page.

One of the nice features is that you can duplicate values in the string too. For example:


$format = 'The %2$s contains %1$d monkeys.
           That\\'s a nice %2$s full of %1$d monkeys.';
printf($format, $num, $location);

It’s not a good thing when coding in a procedural manner. When the $item_name variable is escaped in the database part, what happens if you later on want to show it in the HTML section? If you don’t escape it for the HTML page you will have problems. If you try and HTML escape the database escaped value, you will have problems. If you retrieve it freshly again you will either duplicate the sanitizing, or forget to do so, and have more problems. All because the variable was escaped for the database, which then remains escaped long after the database part is complete.

So, either escape it inside of a function so that it only remains escaped inside of it, or use a different variable to escape it (messy), or escape it inline with the string.

Instead of using sprintf it’s possible to just escape it inline with the text, but then you have readability problems. It can then be difficult to find and fix any SQL syntax problems, if you have variables messing up your ability to interpret things.

About the cleanest way that I have found to escape variables inline, is something like this:


$sql = 'INSERT INTO items (user_id, item_name)' . 
    ' VALUES (' . intval($userid) . ', ' .
    '"' . mysql_real_escape_string($item_name) . '")';

That makes it clearer what is happening and why, but it still makes the SQL code harder to read. Did you notice the syntax error in above SQL statement above?

If it’s in a separate function , you can then escape the variables and use inline variable expansion:


function addItem($link, $userid, $item_name)
{
    $userid = intval($userid);
    $item_name = mysqli_real_escape_string($item_name);
    $sql = 'INSERT INTO items (user_id, item_name) ' .
        "VALUES ($userid, '$item_name')";
    ...
}

but that then disguises and hides the variables, plus there a small performance cost as PHP searches for and expands them.

So that’s why I prefer to use sprintf. Not only because it’s in the documentation for when working with SQL strings, but because it makes the SQL statements very clear, you can easily escape the variables inline, and you can easily tell that all of the appropriate variables have been properly escaped.


$sql = sprintf('INSERT INTO items (user_id, item_name) VALUES (%d, "%s")',
    intval($userid),
    mysql_real_escape_string($item_name)
);

Wow thank you. I don’t quite know what to say, but that’s a very nice recommendation indeed. Thanks.

Okay, well just to clarify, the filter helps to remove tags and other nasties that might come in (XSS attacks), and the escape helps to protect the database from SQL injection attacks.

Without the filter, your database can then contain strings with those XSS attacks, which are then time-bombs waiting to be read and inappropriately displayed.
Without the database escape, SQL injection attacks can easily get through. Read SQL Injection Attacks by Example for good examples of what you’re protecting against.

Protecting both the input and the output leads to the best safety results with a minimum of thinking required. Whenever something is coming in to your code you defend, and when it’s leaving your code you protect. That way, it’s much easier for you to recognise when it needs to be done, and it’s harder for mistakes to occur.

Does that make sense?

That all makes sense, thank you. I’ve taken to liking sprintf, especially since I will be converting my code to this far prettier/more readable layout, sprintf just suits that flow better I think.

re the $date issue

if my form had three POST options

$_POST[‘year’], $_POST[‘month’], $_POST[‘day’]

is it possible to combine them in the initial filtering?

$date = filter_input(INPUT_POST, 'year . - . month . - . day', FILTER_SANITIZE_NUMBER_INT);

or something similar? can’t seem to find anything about multiples in filter_input.

I eventually want $date to contain the following exactly when it is inserted in the database:

$_POST['year'] . '-' . $_POST['month'] . '-' . $_POST['day']

I think I’m good to go, at least for now. I have a lot of code to clean up!

It often seems to me that a lot of people starting out using php seem to love to open and close wrappers for no good reason – be it <?php and ?>, or doing double quotes with single echo’s per line… though personally I’m in favor of getting rid of <?php and ?> altogether and making it behave like a REAL programming language… this popping in and out of code nonsense might be the wet dream of ASP programmers, but it’s hardly something I’d consider good coding practice…

Even simple things like oknow’s example:


 echo 'Welcome'; 
     echo htmlspecialchars($username); 
     echo '.';

leaves me going “what the?” – though I’ve seen far, far worse:


<?php echo 'Welcome'; ?>
<?php echo htmlspecialchars($username); ?>
<?php echo '.'; ?>

Which is treading into Carlos Mencia grey territory.

I mean is:


 echo 'Welcome',htmlspecialchars($username),'.';

Really so hard to work with?!?

Though it gets worse with the folks using double quotes for EVERYTHING, and then not even bothering to use ANY of the reasons to use double quotes – inlined vars or escape codes. That one just leaves me shaking my head thinking “what the devil?”

Though I suspect I’ve been at this stuff for too long… after three and a half decades of coding in everything from ADA to ZPL, a lot of what I consider common sense and good practice most coders today have no chance of ever even learning.

Probably why so many poo-poo over simple easy to practice things like consistent code formatting or verbose element names.