When to use htmlentities and other confusing functions?

I am creating a web form for me to enter articles into my MySQL database. No one else will ever use this form, and it is an “admin” form.

The form will also be used to copy and paste existing articles which were previously PHP files.

My articles are Plain-Text that has been marked up in HTML so there is the original Copy plus the HTML tags.

Because my articles will contain single and double quotes, I plan on using mysqli_real_escape_string to properly escape problematic characters.

My question centers on when, and if, I should be using htmlentities or htmlspecialchars or html_entity_decode?! :-/

So people tell me that if I don’t save my marked up articles using htmlentities (and then decode it using html_entity_decode) that my browser won’t properly output things?!

If I Google, I get 50/50 on this topic.

I do know that currently my PHP files are just the article which was marked up in HTML like this…

<h1>Postage Meters Save Money</h1>
<p>Do you have to me a large company to benefit from a Postage Meter? <b>Absolutey not?!</b></p>
<p>Leasing a postage meter can be a good idea for several reasons:
<li>Reason 1: ----</li>
<li>Reason 2: ----</li>
<li>Reason 3: ----</li>
and so on…
</p>

I want to store the article - marked up in proper HTML - like above so that when it is queried from MySQL, that I can use a simple…


<?php
   echo $body;
?>

So what is the correct way to do all of this?! :-/

(Again, this is just a back-end form to help me hopefully avoid the issues I was having pasting data into the web form on phpMyAdmin. The main problem being that phpMyAdmin/MySQL didn’t like my single and double quotes!!)

Debbie

Functions like htmlentities and htmlspecialchars effectively “break down” the html so it can be displayed instead of marking up text. Since you want html to work properly should just save it as is.

Anyone else agree or disagree? :-/

Debbie

Because you are writing and storing the HTML directly, you won’t need to call these escaping functions. You’ll be responsible for ‘escaping’ it as you write it. (e.g. writing " in your markup instead of " if you need to quote something).

Using htmlentities wouldn’t work here, because your HTML tags would also get encoded.

You do need to use these functions when data will be displayed inside HTML.

HTML Purifier - Filter your HTML the standards-compliant way!

Or using mysqli_real_escape_string or Prepared Statements to do the same thing…

Debbie

Nope. Escaping is for a specific context or purpose. The escaping you do to get data ready for an SQL query is different to what you need to display it in HTML, or use it in a an email header.
You should be escaping or encoding for whatever it is you are about to do with the data.

real escape string (or equivalent) escapes characters that can be confused as part of the SQL instruction, so that the data is inserted correctly. It doesn’t know that you plan on displaying this as HTML later. [fphp]htmlentities[/fphp] is about escaping characters which can break HTML. It’s not concerned with how that data is stored.

Suppose you wanted to manually write HTML consisting of a paragraph, which looked like this when rendered in a browser:

It’s <wednesday> today

The HTML could look like this: (NB the single quote could also be encoded)


<p>It's <wednesday> today</p>

See how I had to ‘encode’ the <> around wednesday, because that is content, not markup?

And if that were a string, you’d need to escape or use PDO


$value = mysqli_real_escape_string($link, "<p>It's <wednesday> today</p>");

So to do SQL INSERTs, I should be escaping single quotes to preserve the query. Thus mysqli_real_escape_string would be the way to go.

real escape string (or equivalent) escapes characters that can be confused as part of the SQL instruction, so that the data is inserted correctly. It doesn’t know that you plan on displaying this as HTML later. [fphp]htmlentities[/fphp] is about escaping characters which can break HTML. It’s not concerned with how that data is stored.

Okay.

Suppose you wanted to manually write HTML consisting of a paragraph, which looked like this when rendered in a browser:

The HTML could look like this: (NB the single quote could also be encoded)


<p>It's <wednesday> today</p>

See how I had to ‘encode’ the <> around wednesday, because that is content, not markup?

That was a great example. And, yes, I follow that example completely!

And if that were a string, you’d need to escape or use PDO


$value = mysqli_real_escape_string($link, "<p>It's <wednesday> today</p>");

So, since I am storing single quotes…

Debbie’s Mother’s Sister’s (i.e. Debbie’s Aunt’s) name is also Debbie!!

… then I’ll want to use mysqli_real_escape_string to properly escape the ’ before doing an INSERT.

And since I am also marking up my Text with HTML that can then be echoed in PHP…

<p>Postage Meters can save your company money!</p>

…then I could leave it as it is, because by default, my browser will know what to do with <p> and </p> (whereas if I have “<wednesday>” then that isn’t necessarily so).

Does that sound correct?

Debbie

Yeah, you’re on the right track. You need to use mysql_real_escape_string whenever data is going to be used in an SQL query (unless you’ve cast it to be an int or float or something). It doesn’t matter what you plan on doing with that data later (displaying in browser, sending in email, writing a PDF, whatever). Once it’s in the DB, data intact, then job done as far as m_r_e_s is concerned.

Later, when you pull that data out and display in the browser, it needs to be in a form where the content won’t break the markup, which you understood with the <wednesday> example.

If you were storing simple values, such as name = Jimbo O’lachlan you could just use [fphp]htmlentities[/fphp] when displaying, to avoid that text breaking the HTML of the page around the value. I.E. if HTML were not allowed in your data, then htmlentities will ensure that the output is treated by the browser only as text, and not HTML.

However, because you are writing chunks of HTML, you can’t just use htmlentities on display, because it will also encode your markup (the <p> as well as the <wednesday>). This is why, as the author, you’ll need to manually write correct HTML ready for the browser. So you’ll be responsible for typing < if you actually want to see a < that is not part of a tag.

You can think of escaping/encoding as being like ‘dressing’ your data for a specific purpose. Just as you don’t dress the same way to go to a wedding as you do to go scuba diving, you don’t consider mysql_real_escape_string to be helpful in encoding data for HTML output, or htmlentities to prevent SQL injection.

Because my articles will contain single and double quotes, I plan on using mysqli_real_escape_string to properly escape problematic characters.

It was or mentioned before but I just wanted to emphasize one point. Do not use mysql_ functions. Purge them from your code and your brain. I know there are a gazillion articles, books and examples but they all all obsolete and should be trashed.

Spend a couple of days and learn to use PDO. It’s really quite simple and offers the same functionality as the mysql_ functions. The huge difference is that you no longer need any of the mysql escaping routines. No need to worry about single vs double quotes. No need to worry about sql injection. All of the database specific concerns just go away.

It’s only one piece of the puzzle. You still need to make sure you send properly formatted html back the browser but it’s a big piece with a solid solution.

All very interesting!

Okay.

Later, when you pull that data out and display in the browser, it needs to be in a form where the content won’t break the markup, which you understood with the <wednesday> example.

If you were storing simple values, such as name = Jimbo O’lachlan you could just use [fphp]htmlentities[/fphp] when displaying, to avoid that text breaking the HTML of the page around the value. I.E. if HTML were not allowed in your data, then htmlentities will ensure that the output is treated by the browser only as text, and not HTML.

So if some day I build a database-driven website about learning HTML, then I’ll want to use htmlentities bunches since I’ll need that to display the HTML mark-up like it is normal text (versus telling the browser to literally interpret the HTML tags as formatting queues), right?

However, because you are writing chunks of HTML, you can’t just use htmlentities on display, because it will also encode your markup (the <p> as well as the <wednesday>). This is why, as the author, you’ll need to manually write correct HTML ready for the browser. So you’ll be responsible for typing < if you actually want to see a < that is not part of a tag.

Well, since I’m not writing articles on Math, that shouldn’t be too much of an issue, but I’ll have to keep that in mind.

(I suppose if I’d use one of these fancy-smancy Text Editors like I’m typing into right now, that would solve a lot of these problems, right?)

You can think of escaping/encoding as being like ‘dressing’ your data for a specific purpose. Just as you don’t dress the same way to go to a wedding as you do to go scuba diving, you don’t consider mysql_real_escape_string to be helpful in encoding data for HTML output, or htmlentities to prevent SQL injection.

So having a database full of \’ is of no concern, right?

I guess it means I will have to use PHP’s stripslashes everytime I want to echo a particular field, right?

Debbie

If you use mysql_real_escape_string() then your database doesn’t end up full of \

It would only end up full of \ if you used addslashes() to add them.

The escaping is only used to determine what is data and what is SQL while running the actual SQL, the escaped data is automatically unescaped before being stored.

Of course with modern database calls that keep the SQL and data separate (PDO or mysqli_) you don’t need to escape the data to distinguish it from the SQL.

Escaping is only necessary where the data is allowed to contain something that could be misinterpreted as a command and where there is no facility for keeping the data and the commands separate.

Consider if you have office keys and house keys. If you keep them all on the same keyring then you need to mark one of the sets of keys in some way to distinguish which keys belong to which location - eg escaping all the house keys by placing a white dot on the side of each - that way anyone you hand the keys to who knows that white dots means house keys can tell for each key whether it is an office key or a house key just by looking at the key. If instead you place them on separate keyrings then if you have the office keyring then you know none of the keys are house keys without having to look for white dots and you don’t need white dots on the house keys either because you know none of them are office keys.

In the same way you know that everything in the SQL PREPARE statement is SQL (office keyring) and everything in the BIND statement is data (house keyring) and so don’t need to escape the data as you would have needed if you used the antiquated QUERY statement (single keyring) instead.

I’m not sure why people keep saying that, because it isn’t true.

When I look in my database field, I see…

Debbie\\'s Mother\\'s Sister\\'s i.e. Debbie\\'s Aunt\\'s name is Debbie also!

Of course with modern database calls that keep the SQL and data separate (PDO or mysqli_) you don’t need to escape the data to distinguish it from the SQL.

Escaping is only necessary where the data is allowed to contain something that could be misinterpreted as a command and where there is no facility for keeping the data and the commands separate.

Consider if you have office keys and house keys. If you keep them all on the same keyring then you need to mark one of the sets of keys in some way to distinguish which keys belong to which location - eg escaping all the house keys by placing a white dot on the side of each - that way anyone you hand the keys to who knows that white dots means house keys can tell for each key whether it is an office key or a house key just by looking at the key. If instead you place them on separate keyrings then if you have the office keyring then you know none of the keys are house keys without having to look for white dots and you don’t need white dots on the house keys either because you know none of them are office keys.

In the same way you know that everything in the SQL PREPARE statement is SQL (office keyring) and everything in the BIND statement is data (house keyring) and so don’t need to escape the data as you would have needed if you used the antiquated QUERY statement (single keyring) instead.

Prepared Statements don’t address this issue because this is all data…

Debbie\\'s Mother\\'s Sister\\'s i.e. Debbie\\'s Aunt\\'s name is Debbie also!

Debbie

As felgall says, you don’t end up with extra quotes in your data.

It’s a SQL parser issue. Ultimately you send the database something like:
INSERT INTO person VALUES(‘Joe’,‘Brown’);
The SQL parser looks for the first quote, then the second quote and treats everything in between as a string.

But then
INSERT INTO person VALUES(‘Pat’,‘O’Malley’);
Now you see how the ’ mark which is actually part of Pat’s name can mess up the parser. So we do something called escaping:
INSERT INTO person VALUES(‘Pat’,‘O’‘Malley’);
The parser knows that two ‘’ should be treated as one quote within the data. It strips out the second one so the name O’Malley is what is actually stored and is what you actually get when you retrieve it. This is basically all that mysql_escape does.

But consider the alternative using prepared statements:
INSERT INTO person VALUES( :firstName,:lastName);
$params = array(‘firstName’ => $postedFirstName, ‘lastName’ => $postedLastName);
Instead of data in our sql, we put in place holders. Which also makes it easier to actually write the sql in the first place. We then prepare the statement and send it the values in the array. We don’t need to worry if $postedLastName has quotes in it. It’s not actually being parsed so it’s no problem.

Then something is wrong. Somewhere you are double escaping things. Maybe post a bit of code. But with prepared statements you should be sending values directly from $_POST. It appears that you still have your posted date being escaped somewhere along the line.

And you are looking directly at the data using the mysql command line interface or perhaps something like phpmyadmin? If you are looking at through your browser then it’s possible you might be escaping it on the way out.

Here is my PHP…


//********************************************
// HANDLE FORM.							 *
//********************************************
if ($_SERVER['REQUEST_METHOD']=='POST'){
	// Form was Submitted (Post).

	// Trim all form data.
	$trimmed = array_map('trim', $_POST);


	// ********************
	// CHECK FORM DATA.	*
	// ********************

	// Check Article Body.
	if (empty($trimmed['body'])){
		$errors['body'] = 'Please enter an Article Body.';
	}else{
		// Escape problematic characters.
		$body = mysqli_real_escape_string($dbc, $trimmed['body']);
	}


<!-- Body -->
<li>
	<label for="body"><span class="required">*</span>Body:</label>
	<textarea id="body" name="body" class="text" cols="83" rows="15" wrap="soft"><?php if(isset($body)){echo $body;} ?></textarea><!-- Sticky Field -->
	<?php
		if (!empty($errors['body'])){
			echo '<span class="error">' . $errors['body'] . '</span>';
		}
	?>
</li>

If I insert this text in my form field…

Debbie’s Mother’s Sister’s (i.e. Debbie’s Aunt’s) name is Debbie also!!

Then in phpMyAdmin I see…

Debbie\'s Mother\'s Sister\'s (i.e. Debbie\'s Aunt\'s) name is Debbie also!!

Debbie

So obviously this line:
$body = mysqli_real_escape_string($dbc, $trimmed[‘body’]);
Shows that you are still escaping your incoming data. Replace it with:
$body = $trimmed[‘body’];
And see what you get. If something blows up then show your prepared statement code where $body is actually being inserted into the database.

It really will work eventually:
mysql> update person set mname = ‘O’‘Malley’ where person_id = 1;
Query OK, 1 row affected (0.00 sec)

mysql> select mname from person where person_id = 1;
±---------+
| mname |
±---------+
| O’Malley |
±---------+

You lost me as to what is the problem.

I changed my code to…

		$body = $trimmed['body'];

When I enter…

O’Reilly’s book

In the database I now see…

O’Reilly’s book

I don’t get it?!

Debbie

What exactly don’t you get? Why it works or did you expect something different? Are you using PDO to insert the date or mysql functions?

Huh?

Umm… You claim that mysqli_real_escape_string is supposed to escape quotes, but not show them in MySQL.

And $trimmed = array_map(‘trim’, $_POST); is just supposed to trim beginning and ending spaces?!

So why would removing mysqli_real_escape_string and just using $trimmed = array_map(‘trim’, $_POST); apparently escape my quotes but not leave escape characters in my database?

(I posted pretty explicitly what was happening in each scenario.)

Debbie