Htmlentities breaking session serialization

Hello,

I have created my own script to store session information and session data in a database. My script uses MySQL 5.1, PDO and prepared statements for database interactions. My PHP version is 5.2.14.

The prototype works great.

I’ve started adding the proper escaping of data. The problem starts when I use htmlentities to wrap the session data during the write function. Some how htmlentities is breaking the serialization of the session data. When it writes data that has been htmlentitied, the data gets nulled.

On my index page I start a session and set a session variable to ‘Hello World’:

Here’s the results from the write function on the index page:

ID: 3mcib19jgg43l9rmov2esskrv5
DATA: test|s:11:"Hello World";
USERAGENT: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2.10) Gecko/20100914 Firefox/3.6.10

I click a link to the next page which starts a session but does not set a session variable. The results from the read function on the second page:

ID: 3mcib19jgg43l9rmov2esskrv5
DATA: test|s:11:"Hello World";
USERAGENT: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2.10) Gecko/20100914 Firefox/3.6.10

The results from the write function on the second page:

ID: 3mcib19jgg43l9rmov2esskrv5
DATA: test|N;
USERAGENT: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2.10) Gecko/20100914 Firefox/3.6.10

If I remove the htmlentities, it works like charm. Anyone else had this problem? Anyone got a suggestion for a fix?

Here’s part of the write function:

$session_data = htmlentities($session_data, ENT_QUOTES, 'UTF-8');
$sql = 'REPLACE INTO table (sessionion_id, sessionion_data, session_expires, httpUserAgent) VALUES (:sessionid, :sessiondata, :sessionexpires, :httpUserAgent)';

TIA,
Noob

Yes, obviously, nobody has a first name that has <,> or those other non alpha characters in it. Using the first name field as an example was a bad choice. The example should have been a text area of sorts.

The only correct solution is to validate the input fields correctly in the first place. How many people do you know whose first name is “<script>alert(‘XSS’);</script>”? I don’t know anyone who has anything other than letters in their first name and so is_alphabetic() is the obvious function to use to validate that field.

htmlentities is an output function that should only ever be used immediately before writing content into the HTML and where that content is allowed to contain characters such as < > & " etc that need to be converted to entity codes in order to display properly in the HTML. This function has nothing whatsoever to do with security.

Also, I solved the problem I was having in the original post.

If I move the htmlentities wrapping to when/where I set the session variable instead of when the session class is running the write function, it works. Sort of defeats my original idea, but it works. I will continue to work on it until i develop the optimum solution. On to my next task, encrypting the session data…

Why are you using htmlentities in the first place?
You only use that function when outputting to HTML.

Furthermore, since you are using Prepared Statements, there is nothing to have to do concerning “escaping”.

You’d still only call that function when writing that field into HTML where you don’t want those characters getting confused with HTML tags in the page. It isn’t necessary if you are not generating a web page to display the value in.

Blanket escaping…I almost used that exact phrase in my original post. :slight_smile:

It comes down to what you need to achieve. However, for your example I would say the best approach is validation. Should “firstName” be allowed to contain characters beyond the ABCs? If not you should deny the use of this data and require the user to resubmit. You also need to remember what you did to “escape” the data if you need the unescaped version. Whether that is an issue of course is subjective.

But using blanket escaping* is a very bad idea.

* Blanket Escaping, not a real word but my own for when you apply a method of data escaping to everything whether it applies to it (Using SQL escaping methods on data for a Regex expression) or not.

So, I’ve ran some tests. Here’s the snippet of code I used:

$_POST['firstName'] = "<script>alert('XSS');</script>";
$_POST['lastName'] = "Noob";
$_POST['lastUpdate'] = "2006-02-14 04:34:33";
$sql = "INSERT INTO actor (first_name, last_name, last_update) value (:firstName, :lastName, :lastUpdate)";
$sth = $dbh->prepare($sql);
$sth -> bindParam(':firstName', $_POST['firstName']);
$sth -> bindParam(':lastName', $_POST['lastName']);
$sth -> bindParam(':lastUpdate', $_POST['lastUpdate']);
$sth->execute();

$id = 233;
$sql = "SELECT * FROM actor WHERE actor_id=:id";
$sth = $dbh->prepare($sql);
$sth -> bindParam(':id', $id);
$sth->execute();
while($row = $sth->fetch()) {
	echo "<p>";
	echo $row[0] . "<br />";
    echo $row[1] . "<br />";
    echo $row[2] . "<br />";
    echo $row[3] . "</p>";
}

Prepared statements will in fact put the <script> into the database as is.

I understand completely that your point was htmlentities should be used on the retrieval/display portion. I agree, this will disarm the script. I see your point.

I was using htmlentities to disarm the <script> prior to insertion.

What is the best way to disarm the <script> before inserting it? Or, is the current ‘best practice’ to insert it and only worry about it on the retrieval/display portion?

Prepared Statements don’t mitigate, they completely avoid SQL Injection as long as you don’t use the input to build the query itself. HTML Entities is for HTML, what you are doing is 100% pointless, it is not cleaning or escaping anything that would harm PHP or MySQL. As you found out it can in fact break things. Using the wrong tool for the wrong job.

To “clean” data, you do not escape it. You validate and filter the data; deny the use of the data or remove the bad parts from the data.

Thank you for your time and input. I will revisit the code.

I was using htmlentities prior to writing to the database for the following reasons:

  1. To me, it make sense to “clean” input the moment you get your hands on it. To me, storing raw input is like setting a trap for someone.

  2. I never know who is going to work with this code down the road and there’s no guarantee they will “clean” it before displaying it.

  3. After working with mysql_real_escape_string for years, it seemed that I should do something before sending potentially harmful data to the database. A good bad habit or a bad good habit I guess.

I understand prepared statements can mitigate statement injections. But I can’t bring myself to rely on that by itself. Is it completely that horrible of an idea to run htmlentities on input before storing it?

HTML Purifier is a good choice.