Convert encoding (ISO-8859-1 to UTF-8)

Hello,

I’m having problems with converting ISO-8859-1 encoded text to UTF-8 encoding with some special chars.

For example:

$str = 'blah – - test';
echo utf8_encode($str);

will output:

blah – - test

The problem appears if I have one of those chars in the text: –, ’, “, etc…

If I try converting it using iconv (with translit or ignore option), i get the same result:

echo iconv('ISO-8859-1', 'UTF-8//IGNORE', $str);

Does anyone have any idea what is going on?

Perhaps those characters don’t exist in UTF-8? Is there any solution to know substitutions for them?

Thanks a lot for help!

$str is probably not really encoded as iso-8859-1. Those angled quote characters, and probably that weird hyphen character don’t existing in that character set.

For starters, if your goal is utf8, and you’re using a web browser and web page to view and test this, you need to set the proper http header so the browser understands the encoding.


header('content-type: text/plain;charset=utf-8');

Be aware that the encoding you set your text editor to plays a part here if you’re pasting string literals into the file. The editor might be doing some conversions. Some editors behave differently in how and when they convert, so I would try to avoid testing like this and just get the data from wherever it comes from, like the db.

Personally I usually look at the byte values in the string


print_r(array_map('ord', str_split($str)));

And then from there try to match it up with an encoding by looking at the byte values, and how many/which bytes are used to represent certain characters.

In your case, I think you’ll probably want to be looking at windows 1252 http://en.wikipedia.org/wiki/Windows-1252

Already answered, Nonetheless if you know already what you want to see in place of these special chars, you may consider replacing these chars with your own text/words/no utf chars. And if you disagree, then you may try out the other angel of it, by trying to know the meaning of it through byte chars as said already.

Set your page headers like mentioned above and try something like this:

echo iconv(‘UTF-8’, ‘ISO-8859-1//TRANSLIT//IGNORE’, $str);

Should take care of most things.