Creating a site search engine with SQL injection prevention

I mentioned in an older thread that I have been creating a site search engine for a client, which searches across table of products in a mysql database. I know mysql_real_escape_string will prevent SQL injection, but since it adds slashes to any search queries that contain quotes, it hampers the result set because of the slashes. For example, if the user types in a search query “microphones” (surrounded by quotations), it generates a statement like so:

SELECT * FROM producttable WHERE productname LIKE ‘%\“microphone\”%’

I know some people actually like to search using quotes to better refine their search (you can do this on google right?). What if I want to add this feature to my search function? I know I could just strip quotations from the string, but that wouldn’t be as strong as far as defending against SQL injection goes. Any advice on SQL injection prevention in a search engine situation where a user might want to use quotations?

Thank you in advance. I am new to all of this!
Jesse

Agree.
But one example is enough.
I can’t say this article is valuable. Not even because half of the examples just won’t work with our scripts. But because it is boosting on one single vulnerability.

But as long as your PHP sends an integer as the LIMIT operator is expecting, it will run fine.

For the purpose of my own understanding, let me present a simple example using the LIMIT operator. Soon I need to add a paging system to my client’s site, where it limits each page to let’s say 20 products and then user would have to click “next page” or “page 3” to view more results.

If the user is just clicking on a link that says “Page 3”, then it could easily be set as an integer. In the event someone tries to attack the site by creating their own form and submitting to my PHP file, binding the PDO parameter as an integer will throw the error and stop injection (well, stop everything, but that’s okay). Otherwise there should be no reason the variable would ever be anything but an integer, assuming that’s the way I’ve programmed it.

If for whatever reason you want to give the user the ability to type the LIMIT value in by hand (eg. GO TO PAGE: _______<–with an html input box) and there is the possibility of them entering a string (that’s not an integer), then you could validate the input and echo a more friendly error to the user and bring them back to the same page they were on BEFORE even running the statement, preventing any PDO related errors.

Am I right?

The slashes are not a problem. Your query will match any product with the word “microphone” (including the quotes) in its productname. It won’t be searching for the slashes, they’re just part of the syntax of SQL to tell the server those are part of the literal string and not delimiters.

That’s totally separate from the real problem you’ll have – which is that you don’t want to look for quotes in the name, you want the quotes to have the special meaning of grouping words in an order. You can either use a FULL TEXT index and use those features to do your searching, otherwise you need to do some more intelligent parsing of the user’s query to produce a SQL query that matches their intent. It won’t have quotes in it.

Hmmm, maybe I am misunderstanding you, but when I search “microphone” WITHOUT quotes I get results. However, when I search “microphone” INCLUDING the quotes, it returns no results. Here’s my search engine in progress: http://www.horizonsolutions.org/_new/search.php

You lost me there. What do you mean full text index?

That’s because none of your product names have quotes in them. When you search for quotes, you’re really only going to get rows that have quotes in the values. Computers are very predictable – they don’t know you mean for the quotes to mean something and not just be another character like x, y or z.

Quotes have special meaning at a search engine like Google because they wrote code to give them that meaning. You haven’t written any code like that, you just put the user query into your SQL query. It’s not a minor tweak either, you have to flesh out some logic.

Here’s what I was referring to with full text indexes:
http://dev.mysql.com/doc/refman/5.0/en//fulltext-search.html

Alternatively, you can integrate someone else’s search system into the site. Other people have built good search engine software so you don’t have to. You can even use Google Custom Search to create a Google search that only searches your client’s site.

First of all… thank you for your time, I am learning a lot!

Ahhh, I see. I just tried adding a test product to the database with “microphone” (including quotes), and now when I search microphone (with quotes) it does return that row; even though the quotes were escaped like so: \“microphone\”. I also tried adding other \ slashes into an insert product query, and i notice that SQL does not store them in the database. So am I right in saying that the \ blackslash’s purpose is to escape characters like quotes?

Now I understand mysql_real_escape_string makes SQL queries safe. How liberal should I be with it?

  1. I have read that some people use it on ALL user input. I found this code on user comments section of the php.net website (see below). Is this a smart? Or should I be applying it to each variable separately.
<?php 
  //This stops SQL Injection in POST vars 
  foreach ($_POST as $key => $value) { 
    $_POST[$key] = mysql_real_escape_string($value); 
  } 

  //This stops SQL Injection in GET vars 
  foreach ($_GET as $key => $value) { 
    $_GET[$key] = mysql_real_escape_string($value); 
  } 
?>
  1. If I should be applying it to each variable separately, should it go on ALL user input? Or just user input that gets inserted into SQL? For example, can someone inject SQL through a simple PHP mail form (which does not access a database)?

  2. To take this further, can someone inject SQL into the address bar? If so how would I protect against that?

  3. Lastly, the PHP.net article on SQL injection says “Never trust any kind of input, especially that which comes from the client side, even though it comes from a select box, a hidden input field or a cookie.” …so I should be using mysql_real_escape_string even on select boxes and radio buttons?

Thank you, I will look into this.

I know about google site search, but I want a more custom engine with advanced searching options that cater to fields in my clients database (ie. to search by rating, or search by model name). I also like how the result set (in this case a list of products) is displayed using my CSS design, which keeps the search engine very consistent with the rest of the site. I assume that google’s site search will search the entire website and output a google-like result page?

I apologize for the long post!

  1. I only use it on user input which is a string and is going to be used in a database. If the user input is supposed to be a number, I don’t run it through a string escaping function, I make sure it’s a number. If it’s an integer, an easy way is to just cast it to an integer.
$number = (int)$_POST['something'];

If $_POST[‘something’] had a string in it instead of a number, it’ll be turned into a harmless 0. There’s also functions in PHP like is_numeric you can use.

  1. SQL injection is an attack that involves someone crafting a string that you insert into a SQL query in a way that changes the meaning of your query. You can’t have SQL injection in code where you’re not creating a SQL query to send to a database. So don’t go crazy with it. You have other types of attacks to worry about when it comes to form mailers (e-mail header injection).

  2. There’s no difference between the address bar and a form. Both end up populating $_GET or $_POST. All you need to worry about is escaping any strings that originate from a user before putting them into a database query.

  3. Yes, ALL user input is questioned. Just because you created a select box doesn’t mean the end user has used it. They may have created their own form and pointed it at your PHP script.

So am I right in saying that the \ blackslash’s purpose is to escape characters like quotes?

yes

I have read that some people use it on ALL user input. Is this a smart?

No.
I’ve explained that in your previous topic, but it can be my bad english that prevent you from understand.

Or should I be applying it to each variable separately.

It doesn’t matter
As long as you treat these variables not as user input but as SQL data.

can someone inject SQL through a simple PHP mail form (which does not access a database)?

Try to think over it.
HOW can one make SQL injection if no SQL involved? How can one stole a car from you if you have no car?
If you still don’t want to realize whole picture but want a plain answers - NO, THEY CAN’T

To take this further, can someone inject SQL into the address bar?

I assume you re talking of another SQL injection with no SQL.

Lastly, the PHP.net article on SQL injection says "Never trust any kind of input,

Another stupid article.
I told you 1000 times: user input does not matter.
Any data that goes to the query must be escaped. No matter where it came from. There can be no user input at all, but data should not be trusted anyway. There is no such thing like “trusting”. Escaping is technical thing. You should not think of it as of input data.

…one more quick (i promise) question. While testing these quotes with mysql_real_escape_string, I have successfully added products to the database containing quotes. However, when I go to edit that entry using my PHP edit form, it grabs the text (including the quotes) and puts it into an input field for editing and I get:

<input name="modelname" type="text" size="60" value="test "mic" test" />

As you can see the extra quotes create a problem with the HTML. Should I be converting the text to HTML using htmlspecialchars(), and then converting back with htmlspecialchars_decode() before submitting to the database? Or is there an easier way to do this?

Hey Dan.
You’re great or even gross man in PHP :slight_smile:
But I have to argue this point.
Worrying about escaping only strings that originate from a user leads to an error and danger.

I begin to understand enormous advantage of prepared statements.
Because one have no possibility to think of data source, because they have only one way to do. Right way. No need to think - no possibility to loss

Should I be converting the text to HTML using htmlspecialchars()

yes

nd then converting back with htmlspecialchars_decode()

No

So escaping all user input, like so:

<?php 
  //This stops SQL Injection in POST vars 
  foreach ($_POST as $key => $value) { 
    $_POST[$key] = mysql_real_escape_string($value); 
  } 

  //This stops SQL Injection in GET vars 
  foreach ($_GET as $key => $value) { 
    $_GET[$key] = mysql_real_escape_string($value); 
  } 
?>

…is overkill. I guess doing this would unnecessarily escape quotes in places like a simple PHP mail form entry, right? What you are trying to say is, use mysql_real_escape_string only before data (ANY data, user input, or not) is inserted into SQL. it’s not needed anywhere else right? correct me if i am wrong. (thank you for being patient with me as i try to understand this into my thick head.)

You can’t even run that code on a page that doesn’t talk to a database. mysql_real_escape_string() won’t work if there’s no open database connection.

What you are trying to say is, use mysql_real_escape_string only before data (ANY data, user input or not) is inserted into SQL. it’s not needed anywhere else right?

Exactly.

But!

Also bear in mind that mysql_real_escape_string is not a magic wand, but merely part of syntax. And it has it’s limitations:

  1. It works only with string literals (though mysql will silently convert strings to numbers, so, you can use if for any data)
  2. And only in conjunction with surrounding quotes.

So escaping all user input, like so:
…is overkill.

Yes. This code is nothing else than just… manual magic quotes!

And then converting back with htmlspecialchars_decode()
NO

…do all special characters should exist as HTML entities within a database table? what’s wrong when them existing as normal quotes, as long as they have been escaped in the insert statement?

…right, I should have known that. Thanks.

By limitations, not being a magic wand, are you saying it is not safe by itself? Are you referring to vulnerabilities against SQL injection using /n /r, etc.? Or am I way off here?

As though mysql_real_escape_string deals only with string literals, all other parts of the query require special attention. Here goes the real injection prevention.
LIMIT arguments must be cast to integer by some function or math operations.
identifiers (table and field names) must be hardcoded in the script and used only from that source, for example:

$order_fields=array("name","price","qty");
if (isset($order_fields[$_GET['order']])) $orderby=$order_fields[$_GET['order']];
else $orderby=$order_fields[0]];
$query="SELECT * FROM `table` ORDER BY `$orderby`";

Same for operators and functions.

By limitations, not being a magic wand, are you saying it is not safe by itself? Are you referring to vulnerabilities against SQL injection using /n /r, etc.? Or am I way off here?

It is safe by any means. there is no SQL injection at all, any kind of it. None. It is bulletproof.
As long as you’re talking of data.

$data=mysql_real_escape_string($data);
$sql="SELECT * FROM table WEHRE field='$data'";

No injection can pass this kind of query.
Though it is side effect only. Because you should escape your strings anyway. Not as protection but as syntax part.

But there are other parts of a query which can be dynamic. Of them my previous post.

.do all special characters should exist as HTML entities within a database table?

No.
You’ve lack of practice.
Try to do it and look. No entity you’ve found. I’s like quotes in the query. Exists only in the form’s field.

Just reading back, I missed your post Dan.

This makes sense.

Oh no, you really need to put up an electric fence when you build a website eh. I need to look into this too.

So if a variable is being passed through a URL, it does NOT need to be escaped? Can’t SQL injection be used at the end of: http://www.url.com/view.php?model=microphones

Oh really? Never thought of that. But again this only applies to variables that will be used in an SQL statement right? So as long as I escape every string variable right before entering it into SQL, I am safe. OR, if it is supposed to be a number, validate the data to ensure that it’s a number (even if it came from a check box).

So if a variable is being passed through a URL, it does NOT need to be escaped?

Amazing.