When to escape output?

Whenever I display data that comes from a User or a Form, I always use htmlentities().

But is it necessary to use this function on data that comes from other places?

Here is a scenario that I am not sure about…

I have a script (i.e. “send-pm.php”) which allows a logged in Member to send a PM to another Member.

When the Form is submitted, I grab the $_POST array and sanitize all of the data including to whom the PM is going.

Then after sanitizing things, I put the “SendTo” username in $_SESSION so that another script (i.e. “outcome.php”) can echo a message like this after the PM is successfully sent…


Your Private Message was sent to "DoubleDee".

In my “outcome.php” script, do I need to wrap the Username - which was sanitized and then placed in $_SESSION[‘sendToUsername’] - with the htmlentites() function?? :-/

For the “PM Subject” and the “PM Body” I do this because they could obviously be prone to XSS Attacks, but since I ran the “SendToUsername” variable through a regular expression and made sure it was a valid user in the database, it seems like I don’t need htmlentites for that variable…

Hope you can follow that?!

Thoughts?

Sincerely,

Debbie

You should probably escape it anyway. Let’s say the username is Double<D> - that would not display as you’d want. It isn’t the source of the data, it is the contents.

Escaping your text will help against XSS and such but really as a side effect. Escaping is how you prepare your text for the destination.


SELECT * FROM users WHERE last_name = 'O'Brien';

This query is not a security violation, but is not what was intended because the quote was not escaped which lead to an invalid query. Similarly, HTML output that isn’t what is intended can lead to a security risk but can also just display improperly (like my username example above).

That’s a bad example given that database calls don’t need to be escaped any more now that you can use prepare/bind to keep the query and the data separate.

Outputting to HTML does need to be escaped if the content is allowed to contain characters that might be misinterpreted as HTML tags - < and & being the two most likely. It doesn’t hurt to escape all the content you are outputting to HTML.

I wouldn’t say it’s a bad example. That quote still needs to be escaped. If you have a system in place to ensure that it does, that just makes your day simpler. A similar system can be made for HTML output.

I used a db example because people tend to think that not escaping will lead to truncating tables or something horrible (and it could). My point is that you don’t escape to protect yourself from security risks, you escape because it just needs to be done. Security is a result of doing things right.

Necessary? Technically, no. But is it a good idea? Yes. For two reasons:

  1. The rules of your application may change over time. A value that you know to be safe now might turn not-safe in the future.

  2. As applications get bigger and bigger, the way we manage that complexity is by making each piece be as independent and small-in-scope as possible. Your template code shouldn’t assume to have any knowledge of the larger application, which means it shouldn’t assume to know whether a certain value is supposed to be safe or not.

Your speaking Oranges when Debbie wants to know about Apples. Technically your example is for sanitizing input, not output. The attack Debbie is most interested about isn’t SQL Injection, but Cross Site Scripting (XSS). htmlentities is a good answer for these type of attacks, so are limiting the HTML your users can provide to either 1) nothing or 2) limited tags both of which can be done using [fphp]strip_tags[/fphp].

I whole-heartedly agree with Jeff’s assessment. Yes, it is deifnitely necessary for input from external sources, users, form input, etc. Never trust anything that is external. You never know if their site got compromised and is now feeding you malicious data.

No, it is output to the database and it needs to be escaped for that destination. My point was that you always escape but not for the purpose of avoiding an attack. You escape because you have to. Otherwise you can end up with either invalid (my example) or undesired (XSS) results. Understanding this will be helpful with any future questions she may have.

If you think in terms of “can I trust the source” you will fail. I can trust my database and can safely assume it will give me a valid name when I ask for it, but I will still need to prepare that text for my destination.

Her question was do I need to escape “sanitized” data. The answer is yes because “clean” data is not necessarily prepared for the destination.

No it doesn’t. If you use the correct database calls then there is no need to escape it. Escaping is only needed when data can be misinterpreted as code and that isn’t possible with modern database cals such as mysqli_ and PDO provide. Only the now dead mysql_ interface didn’t provide a way to keep data and code separate.

$last_name = "O'Brien";
if ($stmt = $mysqli->prepare("SELECT * FROM users WHERE last_name = ?"); {
    $stmt->bind_param("s", $last_name);
    $stmt->execute();
}

So in your example, you are referring to the SELECT * instead of the ‘O’Brien’? yes? Because if you are referring to O’Brien, that is sanitizing/escaping your input not output. The SELECT * would be your output, and dependent on the data it contains (it is likely user entered data, thus would need to be encoded using htmlentities or a combination of strip_tags to limit the HTML allowed to be outputted.

When you think of “trusted”, don’t think of any external component. Most database data and the like are from external sources being entered into your data store, thus they are not trusted, trusted data would be anything static that you have 100% control over, templates, design, etc. Those don’t need any work done to them even if you load them into a variable to be written out later (well, okay, there “could” be cases that are very unique and if you are using them, you should already know about it).

I would never imply (nor did I mean to imply, if I inadvertently did) a Database is a trusted data source. That’s just opening yourself to problems.

I agree with this. I answered that it wasn’t technically necessary because I think I read more into Dee’s question than was there. I was thinking that the scenario she was presenting was that of a sanitized username, which (in her case) would always be alphanumeric. And if we know that a value is guaranteed to be alphanumeric, then we can safely skimp on escaping. But long term, that skimping is likely ti bite us.