Html entities?

Hi

Just a question about what you normally do - i am busy with a site and have realised that what i have been doing probably isn’t necessary. When i have a user’s input to my database, i generally use htmlentities, and if not then mysql_real_escape_string. Is it necessary to convert all special characters to html entities? Don’t really know why i started doing it from the beginning. When i started i was very new to this and i think i just read it up somewhere on the internet. Its working fine but now when im populating dropdown boxes and such it needs to be decoded so that it displays correctly.

My database collation is set to utf8_general_ci so im guessing i dont need the html entities?

How do you go about it?

Thanks!

You do not need to use HTML entities not when putting into the database. When outputting to HTML you really only need to escape HTML’s special characters. http://us3.php.net/manual/en/function.htmlspecialchars.php

It’s not necessary to convert them, but it does help.

The two main rules are:

[list][]Never trust database input
[
]Always escape database output[/list]

Getting Consistent Inputs

When values come in from $_GET or $_POST, they might have magic quotes applied to them, or they might not. If they do, the magic quotes aren’t enough protection, so any magic quotes need to be removed, and replaced with a stronger form of protection.

You can remove the ineffective magic quotes by using a common function to get user input.


function get($key, $source=$_GET) (
    $item = $source[$key];
    return stripslashes($item);
}

If you think you’ll want to deal with array-like structures, such as multiple checkboxes with the same name from a form, then the above function can be updated to handle that too.

Now that magic quotes have been stripped, we now have the same value regardless of whether magic quotes are enabled or not. This is important, because as of PHP 6.0 (the next release after the current 5.3) there will be no more magic quotes.

Protect the Database

Protecting the database is a matter of using mysql_real_escape_string on all of the values for your [url=“http://www.php.net/manual/en/function.mysql-query.php”]mysql database queries. If you use mysqli instead, there are other techniques such as [url=“http://www.php.net/manual/en/mysqli.prepare.php”]binding parameters that can help you to protect the database from those inputs instead.

Both of those links above provide good example code that demonstrates how to protect your database from user input.

Escape From the Database

When retrieving data from your database, the technique you use to escape the data depends on how it’s going to be used. htmlentities is a good standard technique to use.

I’m not sure on what the consensus is between using htmlentities and [url=“http://php.net/manual/en/function.htmlspecialchars.php”]htmlspecialchars. Some devices don’t understand htmlentities, but if you’re only outputting to html devices then there should be no trouble.
If you’re intending to output a link though, intended for the url, then urlencode is the function to use there.

The mysql_real_escape_string function is not and should not be used for outputting values. It is only of good use for escaping data, in order to protect the database from bad values.

Would including this in every page where SQL is used effectively escape all POST and GET variables, assuming it’s included before any SQL is done?

<?php 
  //This stops SQL Injection in POST vars 
  foreach ($_POST as $key => $value) { 
    $_POST[$key] = mysql_real_escape_string($value); 
  } 

  //This stops SQL Injection in GET vars 
  foreach ($_GET as $key => $value) { 
    $_GET[$key] = mysql_real_escape_string($value); 
  } 
?>

Source: http://nz.php.net/manual/en/function.mysql-real-escape-string.php

This just seems like a much easier way than escaping the variables in each database call.

That’s not a good idea for a number of reasons.

  • Where magic quotes are enabled on the server (still a common occurance) you will wnd up with doubly-escaped strings.
  • POST and GET values are not all used exclusively with databases. Many times they are intended for some other purpose.
  • POST and GET values may not all be strings. Sometimes they are arrays, and in other occasions they can be file references.
  • Working with POST and GET values should not depend on first creating a database connection.
  • And lastly, it’s not good mixing together the jobs of retrieving POST and GET information, and database connection. Keeping a certain degree of separation between the two is often a very good idea.

I agree with pmw57. Magic quotes should be turned off, and coding practice move away from them. Doing any automatic escaping of GPC is basically rewriting MQ in PHP, and brings the same problems as having the PHP setting turned on.

The following is a very good resource on disabling magic quotes.

The advice is best summarised as:

Edit php.ini to disable magic quotes. Then you won’t have to worry about stripping slashes.

if you cannot edit php.ini, place a directive in .htaccess to disable magic quotes. Then you won’t have to worry about stripping slashes.

If you cannot disable magic quotes at the server level, there is code at the above link that disabled magic quotes. You should only use that code in situations where you cannot tell the server to disable them.

I fully agree with you. The code that was quoted is the ravings of a mad man. Someone who is fortunately not from around here.

mysql_real_escape_string can be just as unsafe to use as addslashes. See http://ilia.ws/archives/103-mysql_real_escape_string-versus-Prepared-Statements.html for a great example of how mysql_real_escape_string is used to gain access on a database using latin1.

Basically, you are trying to protect your website against two basic attacks:

  • SQL injection
  • XSS

Against SQL injection, use mysql_real_escape_string() when inserting data into database. Or, even better, use prepared statements.

Against XSS, what you need to do is escape any data created by user when outputting HTML. The standard way to do this is to use htmlentities(). In theory, you only need to escape the databse information you have no control over. So you don’t need to escape fields that are manually generated by you in PHP without any user input taken in consideration (like timestamps etc). But many developers use htmlentities() on all data to make sure they don’t make a mistake and forget to escape something that needs to be escaped.

The purpose of mysql_real_escape_string is to escape characters that would otherwise be confused with the SQL itself. It has NOTHING to do with security of the data - you need to validate the data first before it comes anywhere near the database to ensure it contains data that is meaningful for the particular field. For example if the data is supposed to be numeric then use the PHP is_numeric function to validate it. Since numbers don’t contain characters that can be confused with the SQL it doesn’t then need to go through mysql_real_escape_string as well.

To filter inputs coming from a form, use the filter_input function. It stripslashes if the magic quotes are on, and you can use some validation filter to be sure you get what you expect (like int with [URL=“http://www.php.net/manual/en/filter.filters.validate.php”]FILTER_VALIDATE_INT).

To insert things in your database, the best way is to use parameterized queries with PDO or [URL=“http://www.php.net/manual/en/book.mysqli.php”]mysqli but if you don’t have access to that, use [URL=“http://www.php.net/manual/en/function.mysql-real-escape-string.php”]mysql_real_escape_string.

When outputting HTML, to be sure the user can not insert any html tag, use htmlentities

When using user input to send mails, you need to escape any
, \r, %0D and %0A character which may be used to insert headers : str_replace to the rescue ! http://www.nyphp.org/PHundamentals/8_Preventing-Email-Header-Injection

Another one, when doing redirects using header('Location: '.$uri), if the URI contains user input, you have to escape n, \r, %0D and %0A characters too because it can be used to add more headers. And don’t forget to add an [URL=“http://www.php.net/manual/en/function.exit.php”]exit after this header to be sure the script stops even if the client does not follow the redirect. http://www.securiteam.com/unixfocus/5ZP022A8AW.html