How to convert these kind of char?

¹ù¸»³Ç - ËÑË÷

I am loading a webpage using DOMDocument and the title gave me those characters. What I’ve tried but not working:


// Apparently setting uft8 does nothing :(
$DOM = new DOMDocument('1.0', 'UTF-8');

// Not working
mb_convert_encoding( $title, 'utf-8', mb_detect_encoding($title) );

Anyone have any ideas?

I tried to use this:


// mb_detect_encoding detected as iso
$title = mb_convert_encoding( $title, 'utf-8', mb_detect_encoding($title, "ascii, cp1252, iso-8859-1, utf-8") );

echo $title;

I got new kind of character garbage :frowning:
¹ù¸»³Ç - Google ËÑË÷

If you have access to the code of the web page open it in Notepad++, go to Encoding and click “Convert to UTF-8 without BOM”.

I may be wrong, but every time I’ve found those characters in my code it’s been because of BOM.

Thanks for your suggestion ultimate, I was reading the page from a random google result page:


$url = 'http://www.google.com.sg/search?hl=zh-CN&biw=1366&bih=636&q=%E9%83%AD%E5%AF%8C%E5%9F%8E&oq=%E9%83%AD%E5%AF%8C%E5%9F%8E&aq=f&aqi=g10&aql=undefined&gs_sm=e&gs_upl=6545l6545l0l1l1l0l0l0l0l295l295l2-1l1a';


$DOM = new DOMDocument('1.0', 'UTF-8');
@$DOM->loadHTMLFile($url)
echo $DOM->saveHTML();
exit;

Once I echo from there, I got this:

<title>¹ù¸»³Ç - Google ËÑË÷</title>

Which I want to convert the title to readable format, like if its russia language just display proper russia, or chinese just display proper chinese.

I’ve got this function BOM here though but the page is from google dynamically itself, so I believe not a bom issue?

 function removeBOM( $str ) {
        if ( substr($str, 0, 3) == pack( 'CCC', 0xef, 0xbb, 0xbf) ) {
            $str = substr($str, 3);
        }
        return $str;
    }

Anyone have any advice on whether the following class help?

http://blog.nairus.fr/public/scripts/php/HTMLEntities.php.txt

Tried the class, nothing works :(:frowning: