Escaping + Filtering on input and output

If I already escaped and filtered before inserting HTML tags into the database, do I need to still escape and filter when grabbing the HTML tags?

Let’s say I inserted something like

This is just a test. Don't let HTML tags go through.

And I escape and filter the HTML tags, it becomes

This is just a test. Don't let HTML tags go through.

Do I still need to escape when grabbing that? I know I read some where that you should always escape when outputting, but what if you escaped while inputting. Does outputting still matter? I’m asking this because when I escape twice (inputting + outputting), the HTML tags get escaped twice and there for, when I output the data like so

This is just a test. Don't let HTML tags go through.

It actually becomes

This is just a test. Don' let HTML tags go through.

So the & sign gets filtered when I escape and filter twice, but if I only escape and filter on input and don’t escape and filter on output, I get the single quote to appear on output.

On inputs you VALIDATE or SANITIZE.

When outputting to somewhere that doesn’t allow data to be kept separate from code (eg. HTML) you ESCAPE.

When outputting to somewhere that allows you to keep data separate from code you do that (eg. SQL - prepare = code, bind = data).

Never escape data on input as escaping is specific to where you are outputting to and breaks the data for any other use…

A rule of thumb that I go by is NEVER TRUST A USER’S INPUT THAT IS GOING TO BE USED AS OUTPUT. Santize the output…validation is making sure what the user enters is what is intended to be inputed (valid data).

You mean sanitize the input - the last thing you want to do is to actually process whatever junk they entered and only sanitize it at the end of the processing when you are ready to output it somewhere.

Sanitizing is an INPUT function.

So does that mean that I am doing it all wrong? The thing is, I am saving these in a database and outputting them later when I grab them from the database. I don’t want to allow HTML characters so that is why I want to encode HTML characters to have their HTML code numbers such as < being &#60; instead, > being &#62;, double quotes being &quot;, .etc.

Do I still need to flip the process around where inputs are validation and on output, I escape all the HTML characters?

I already am validating on input so that isn’t a worry. I check to see if everything is empty first, if it isn’t. I give the user an error, but I also save all of their inputs in sessions and display them on page so what ever they typed up won’t be lost.

Sadly, yes. :-/ Ideally, you want to escape at the latest possible moment. And there are two reasons for that:

  1. You won’t always want to escape for HTML. If instead you use your database content to send an email, or if you want to send a JSON response, then you won’t want HTML escaping.
  2. It’s easier to verify a program as correct and secure if you escape at the latest possible moment. I should be able to look at your templates and see every variable wrapped in an escape function. Otherwise, the way you do it now, if I were to audit your program to check for security holes, I’d have to trace the life of every variable through your program to discover where it might have been escaped.

It seems like I’ll have to filter on input and escape on output then.

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.