Check user input text is UTF-8

I have setup a MySQL DB with collation as UTF-8.
I have set the content type in the header of php – header(‘Content-type: text/html; charset=UTF-8’);
And I’ve added the following to all my web pages - <meta http-equiv=”Content-type” value=”text/html; charset=”UTF-8” />

My site has forms which take user input and insert/update DB rows with this data. How do I make sure the user input is UTF-8 and not any other charset.


Try the alternative.

You might also want to try W3C’s regex:

You can detect the encoding with mb_detect_encoding, if you hadn’t seen that. However, users should be submitting UTF-8 all the time so conversion should not be required.

Thanks for your reply.

So to clarify do I use the mb_check_encoding feature or the alternative provided by javalc6 at for every user form input value to make sure the value is UTF-8.

If its not UTF-8 am I correct in saying the next step would be to detect the encoding used then to convert the string to UTF-8 using mb_convert_encoding.

Many thanks for the help

Are you using mysql_set_charset() right after establishing the connection to the database to inform MySQL that you’ll be using utf-8?



I have:
mysql_query( “SET NAMES utf8”, $connection);
mysql_query( “SET CHARACTER SET utf8”, $connection);

Does this convert everything the user inputs into UTF-8?

Thanks for the reply

That function only works with text entered in ISO-8859-1 charset.

I’m wondering do I need to do anything as I have informed the browser, PHP and MySQL that the site is UTF-8 therefore when a user enters text via input textbox does the conversion automatically happen?

use utf8_encode($string)