Can't get Hebrew to work with mySQL & PHP

Hi everyone at sitepoint,

I have set up a database using collation utf8_bin and all the fields are utf8_bin. When I input Hebrew text on the mySQL database it appears fine on phpMyAdmin but when I try and retrieve this text and echo it on my php site it just comes out as a series of ???. What can I do about this? I searched for answers and many people have asked the same question as me on different PHP forums but I am yet to find a solution.

Many thanks,

Leao

I think the problem is that all my entries are being stored as blobs in stead of text despite the fact that I have set the type to be mediumtext.

I contacted the company that hosts my site and they have no idea why the mySQL is behaving like this. They suggest that I try inserting the data into the mySQL a different way, but I have already tried entering it directly via phpmMyAdmin and also via a php site.

Ideas anyone?

Leao

Does the page that is outputted have in the head section:

<meta http-equiv='Content-Type' content='text/html; charset=utf-8' />

Hi,

I managed to stop the mediumtext turning into BLOBs by changing the mySQL collation to utf8_unicode_ci and reimporting my tables (the collation was utf8_bin originally). But STILL all the Hebrew text is rendered as ??? when echoed onto my PHP page.

Hi, SpacePhoenix - The PHP page does have charset=utf-8 in the head section. You can see the page here: http://www.goatswiththewind.com/noamtest/beit.php?goatswiththe=cheese

Any possible solutions before I go insane?

Much appreciated,

Leao

The page validator gives as one of the errors:

  1. Mismatch between Public and System identifiers in the DOCTYPE declaration
    This document uses an inconsistent DOCTYPE declaration. The Public Identifier -//W3C//DTD HTML 4.0 Transitional//EN declares the HTML 4.0 Transitional document type, but the associated System Identifier http://www.w3.org/TR/REC-html40/strict.dtd does not match this document type.
    The recommended System Identifier for HTML 4.0 Transitional is http://www.w3.org/TR/1998/REC-html40-19980424/loose.dtd.
    The safest way to use a correct DOCTYPE declaration is to copy and paste one from the recommended list and avoid editing that part of your markup by hand.

You need to correct the doctype, for HTML strict it’s

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">

for HTML transistional it’s

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">

The broken doctype might be confusing web browsers.

Hi SpacePhoenix, thank you for our advice. My site is now XHTML valid.

I managed to get PHP to display Hebrew retrieved from mySQL correctly. My web server is set to ‘character_set_server: latin’. I cannot change this as I am on shared hosting so I used the PHP function ‘mysql_set_charset’ to set a utf8 collation for the viewer of my site’s connection. I have put the following code on each page with Hebrew text after the </html> tag:

<?php
$link = mysql_connect('mysql2.steadfast.net', 'goatswi_leo', 'password') or die(mysql_error());
mysql_set_charset('utf8',$link);
?>

This seems to work perfectly. I am new to PHP so I guess I need to know whether putting the above code on every page with Hebrew after the </html> tag is a good idea or maybe I need to place it somewhere else in my code. Maybe I don’t even need to put the code on every page or perhaps just the first page the viewer enters? I put the above code in the main body of the text at first to run each time the PHP retrieved some Hebrew from mySQL database and echoed it onto onto the main body of my site but while this worked while viewing the page the first time afterwards the Hebrew text turned into gibberish ‘◊ô◊¢◊ô◊®◊™ ◊ß◊©◊®’.

Any advice would be appreciated!

Leao

It appears to be displaying the Hebrew fine. Have you tried clearing your browser cache?

Hi Mr Space Phoenix, it works now but my only concern is that I don’t really understand why the script works if I put at the end of a page after the </html> tag but if I put it anywhere else it just renders the Hebrew as gibberish. See this example where I put the script just before the </head> tag. Is there any logic to this? As it stands I can leave it after the </html> tag but I would prefer to understand how the script works a bit better.

Thanks a lot,

Leao

Did you intend to post an example?

Hi,

I have now taken out all the mysql_set_charset functions from all the pages and the Hebrew still displays fine. I don’t get it, it only started working once I began using this script and now I’ve taken it out it works??? However when I try and add the Hebrew text directly onto the database via myPHPAdmin it displays the text as gibberish. However, if I do it via a PHP CMS form I made it works. But I was using this form from the start so it doesn’t explain why the Hebrew works suddenly displays now and not before. It’s great but a little concerning.

Any ideas SpacePhoenix?

Many thanks again,

Leao

Possibly the hosting company might have tweaked something on the server. What CMS are you using?

Edit: I just tried copying and pasting a sample of the hebrew into phpmyadmin, it shows it as a load of ? whilst in a test setup of phpbb it shows the hebrew, so possibly phpmyadmin is not setup to display and process hebrew

Hi Phoenix, I made my own simple PHP CMS.

What I’ve noticed is if I try and put the Hebrew text directly onto the mySQL database via phpMyAdmin the Hebrew appears fine on phpMyAdmin but as a lot of ??? on my website. Conversely if I upload the text via my simple PHP CMS the text appears as gibberish ◊ô◊¢◊ô◊®◊ on phpMyAdmin but as Hebrew on my website. Strange? Maybe my hosting company did change something but it seems doubtful as I was in contact with them and they seem equally baffled.

Thanks again mr Phoenix,

Thank you for all the time you have put into this!

Leao

After a little playing with the charatcer encoding for the field in a test, i changed it to utf8_bin and it displays the hebrew ok. By adding

mysql_set_charset('utf8',$link_id);

to right after I connect to the db.

It now displays properly on a test page. It seems like unless you tell php explicitly to use utf-8 it will revert back to whatever it’s default is. I read up on it a bit more and it’s possible to change php’s default encoding directive to utf-8 but as your on a shared hosting, chances are that you’ll have not control over what the directive is set for.