How to remove non UTF-8 characters?

attuk · March 31, 2010, 5:01am

Hi all :),
I am having a problem with non UTF-8 characters being stored and read from a database for example as � .
For example
I get � for spaces
when I check the database it’s just a space but when displayed in html it’s � !
especially when � is at the end it does not go away when I trim()

it would be great to

detect all non utf-8 characters
convert/replace them
etc
Thanks for any advice.

Ripe · March 31, 2010, 6:57am

Haha, I just had this problem. Here’s a nice function,

    function special_chars($str)
    {
        $str = htmlentities($str, ENT_COMPAT, 'iso-8859-1');
        $str = preg_replace('/&(.)(acute|cedil|circ|lig|grave|ring|tilde|uml);/', "$1", $str);
        return $str;
    }

attuk · March 31, 2010, 1:38pm

Thanks I will give this a try…

logic_earth · March 31, 2010, 5:49pm

Your problem is that you are not storing as Unicode, or manipulating the string with PHP that is not Unicode aware. Or sending to HTML without sending a proper encoding. I assume it is the later, missing encoding.

attuk · March 31, 2010, 10:41pm

Your problem is that you are not storing as Unicode,
that is correct :),
is there a way to detect current encoding and if it’s not utf-8 then to convert ?

Mal_Curtis · April 1, 2010, 12:12am

It’s difficult, as you don’t really know what encoding the current string is in.

I.E if utf-8 is stored in a latin1 table in a db, when it comes out it’ll often be reported as latin1, even though you know it’s utf-8.

You can try converting using mb_convert_encoding, but I’ve had bad experiences.

Topic		Replies	Views
Special Characters pulled from MySql Database PHP	5	3520	March 14, 2010
PHP, MySql and latin characters áéíóú and ñ PROBLEM PHP	8	14614	October 8, 2014
Remove â€ And Other Strange Variables in PHP? PHP	7	27677	December 26, 2010
Carachters converted to <?>! Databases	7	574	March 26, 2010
Check user input text is UTF-8 PHP	7	3839	July 14, 2010

How to remove non UTF-8 characters?

Related topics