mysql_real_escape_string Anomaly?

I am trying to store the following string to a mySQL table, using an Ajax call (the mySQL database is set up to use utf-8):

Academia-Gate — the Nanny State & The Professors: My Brief Email Exchange With The Co-Chair of the “Cry Wolf” Project

The string arrives at my php Ajax handler function looking like this:

Academia-Gate — the Nanny State & The Professors: My Brief Email Exchange With The Co-Chair of the “Cry Wolf” Project

When I try to store it to my mySQL table, the string is stored like this:

Academia-Gate ??" the Nanny State & The Professors: My Brief Email Exchange With The Co-Chair of the “Cry Wolf” Project

I.e. with ??" in place of the long dash near the beginning.

I have tracked this down to something that happens during a call to mysql_real_escape_string.

The following code:

$str = 'Academia-Gate — the Nanny State & The Professors: My Brief Email Exchange With The Co-Chair of the “Cry Wolf” Project ';
$str = mysql_real_escape_string($str, $conn_id);

…changes the contents of $str to:

Academia-Gate ‚Ä\\" the Nanny State & The Professors: My Brief Email Exchange With The Co-Chair of the ‚ÄúCry Wolf‚Äù Project

I.e. it replaces >>>‚Äî<<< with >>>‚Ä\"<<<, so that it is then stored incorrectly in the mySQL table.

How can I correct this?

Thanks very much in advance to all for any info.

Make sure to set the headers in your html page to utf-8 as well.

Also, are you using any function like htmlentities(), htmlspecialchars(), encoding, etc etc ?

Vik_R, according to your quote:

| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 | <– this one here
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 | <– and this one here
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
| collation_connection | utf8_general_ci |
| collation_database | latin1_swedish_ci |
| collation_server | latin1_swedish_ci |

the character sets for both the database and server are set for latin1, try setting them to utf8

StarLion, if you’re not careful, I’m gonna start stalking you! :lol:

Thanks, again.

It appears that mysql_set_charset(‘utf8’); is a PHP command rather than a mySQL command. Running it does not appear to permanently change the values of the mySQL variables referenced by SpacePhoenix.

Good question. That was my first thought as well. I asked about this on the jQuery forum. I was advised to send the same string back to my Javascript calling routine via Ajax response. I did so and found that when the string came back to my Javascript code, via Ajax response, it was perfectly correct. I was told that this means that the string was not, in fact, corrupted in transit.

Firefox reports (View->Character Encoding) that the page is UTF-8.

Also, are you using any function like htmlentities(), htmlspecialchars(), encoding, etc etc ?

I am not currently using these functions.

I am using UTF-8. My framework (CodeIgniter) is configured to use UTF-8 for all database transactions. Using the mySQL “Show Variables;” command shows that the database is also set up as UTF-8:

| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
| collation_connection | utf8_general_ci |
| collation_database | latin1_swedish_ci |
| collation_server | latin1_swedish_ci |

I have had my CodeIgniter config files set up correctly prior to encountering this anomaly. In my database config file, I have:

$db['default']['char_set'] = "utf8";
$db['default']['dbcollat'] = "utf8_general_ci";

…and in config.php I have:

$config['charset'] = "UTF-8";

It sounded like SpacePhoenix was suggesting that editing the mySQL database itself could be a solution.

I will try mysql_set_charset(‘utf8’); now.

The three characters in question are beyond the scope of latin1 (being 3 byte characters), so UTF-8 should be used to encode this string.

The odd thing is that everything works fine with the encoding of the double quotes around “Cry Wolf” - it’s just the encoding of the emdash that’s causing an anomaly.

The data isn’t even UTF-8 dependent is it? :confused:

I’m not sure that’d be a bad thing for an unemployed coder :stuck_out_tongue:

What is the correct way to change the contents of these mySQL variables?

CodeIgniter stores it’s database settings in application/config/database.php , and the relevant directive is $db[‘default’][‘char_set’] = “utf8”;

As far as your script, it depends on what you’re using as a database connector. mysql and mysqli have the mysql(i)_set_charset(‘utf8’); command…

Do you use mysql_set_charset to tell MySQL you will be sending it UTF-8 data? If not, you should :slight_smile:

You’ve already stated it arrives at your AJAX handler corrupt, so surely the problem lies prior to(or with) this.

No?

This sounds like it could be the answer. :slight_smile:

I’ve just done a good bit of Googling and haven’t yet found the correct way to implement your advice. For example, I tried the mySQL query:

alter table 'links' default  character_set_database=utf8;

What is the correct way to change the contents of these mySQL variables?