Once in a while I will get strange characters appearing on my clients site. We are mostly copying and pasting the synopsis descriptions into the database from multiple places on the web. I have never personally seen this happen but apparently the web manager will copy content into the CMS that we have custom built and when we view the Content later we get strange characters appearing.
In the middle of Brendell’s middle name I see strange characters after the s and p and also the E in Metis seems to also be missing. Is this because I am not using the right character type on the column, table or database?
I have done minimal data entry for the clients but it has happened from time to time. For the most part they are copying the descriptions from sites on the web. Apparently, the bizarre characters only show up in the body of copy text after they’ve been added into the database which is what leads me to believe this is a character encoding problem.
For this specific column “book_description” which is a medium-text, PHPmyadmin reports it as being “utf8_unicode_ci”. Which appears to be the default for any table I create. Are you thinking that should be something different?
It appears to work for most of it, but there is still one very strange character located in the 3rd paragraph after the word Threatened
Then her beloved daughter, Zoë, is threatened � and Brendell takes matters into her own hands. To save Zoë, Brendell searches for the stalker and confronts not just a depraved madman but her own fears and prejudices.
I have tried to find out what exactly that character is but to no avail. I can’t seem to make sense of the vast scope of Character Encoding. I even read Sitepoints article on it and it confused me even more lol.
Here is a link (http://software.hixie.ch/utilities/cgi/unicode-decoder/character-identifier?characters=�) to a website I found that apparently tries to identify character encoding. I can’t seem to make sense of it but perhaps someone more skilled in the ways of Encoding-kung-fu will shed some light. Ps. The site says “(this script is currently broken)” so perhaps all of that data is garbage.
One last thing. Do you tell MySQL you want to communicate with it in UTF-8? There are functions to set that, depending on what connection type you’re using (mysql, mysqli, pdo). Google knows
If that doesn’t work, you can try [fphp]iconv[/fphp], and if that doesn’t work I don’t know anymore …
What do you mean by telling Mysql to communicate in UTF-8? How would I go about setting that up. The field is set to utf8-unicode-ci… is there something more to it than that?
Also, I just realized that this particular body of text that is giving me grief actually appears correctly (strange characters and all) from phpmyadmin and also from our custom built cms. So I’m starting to think perhaps the font we are using doesn’t support these strange characters that are in the synopsis. Is that possible?
Success! Kind of … I have set the charset to utf8 and the descriptions appear to be displaying correctly on the public site, but for some reason in the <textarea> boxes in the CMS they display even worse now… Any ideas? Before it was the other way around.
I first considered the different fonts. The front end uses Arial I believe and the back end uses Tahoma. I switched it to Arial and it still looked messed up. I’m sure the clients will be happier with the front end displaying correctly but I would really just like to know what is going on.
Thanks for your help so far any ways
Edit:
I checked out the page encoding on the back end and it was set to ISOxxxxx (I dunno, some number). I presume this is because it’s some kind of default for Firefox, despite that fact that the Database is storing data in utf8 and it’s being delivered in utf8 it still doesn’t choose utf8 by default. So I’m guessing the problem is fully solved now. Thank you so much for taking the time to explain that the data has be set to utf8 at every stage of the game.
Rant:
Out of all my years in experience working with web programming I can’t help but feel this is one of the silliest problems I’ve ever encountered. If utf8 is known to be common best wouldn’t all web browsers default to that? Am I missing something in my final conclusion?