SitePoint Sponsor

User Tag List

Results 1 to 10 of 10
  1. #1
    SitePoint Enthusiast
    Join Date
    Aug 2008
    Posts
    37
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Exclamation Urgent! Pulling Thai data using PHP/IIS/MSSQL returns question marks ...

    Alright, hopefully someone here can give me a clue as to what's going on. I've searched high and low for a solution to this issue, but I'm at wit's end.

    Alright, here's the deal. We have a homegrown CRM for our company. We have employees all over the world, but mainly Australia, UK and Thailand.

    If I log into the CRM the homepage show, basically, a dashboard with company news items and such. If the user is Thai, the news from their department is also, most likely, written in Thai as well.

    This data is stored in a Sql Server 2005 database. When I view the data within the sql server management studio, it's clearly all in Thai.

    Now, when I use PHP to pull the data directly from the database, all I get in return is question marks where the Thai characters should appear. Everything else is readable.

    For instance, instead of:

    ปริยสุทธุ์ อินทสุวรรณ แก้มยุ้ย Pariyasoot Intasuwan is now working at KHONKAEN MENTAL HOSP..

    I get:

    ????? ?????? ????????????? Pariyasoot Intasuwan is now working at KHONKAEN MENTAL HOSP..

    The funny thing is, there are a few areas on the page that are generated by AJAX calls. Some of these calls are pointed to ASP pages. These will return the text just fine.

    All of our encoding is UTF-8 for our HTML output, so double-byte chars are supported. This is happening between PHP and MSSQL.

    Why is this working properly in ASP and I get gobbledy-gook in PHP? How can I fix this?

    Note: Yes, I've tried iconv, and it isn't working.
    TheDrunkenEpic - my ramblings

  2. #2
    SitePoint Evangelist hexburner's Avatar
    Join Date
    Jan 2007
    Location
    Belgium
    Posts
    591
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Try utf8_encode() when outputting the string.
    FOR SALE: 1 set of morals, never used, will sell cheap

  3. #3
    SitePoint Wizard frank1's Avatar
    Join Date
    Oct 2005
    Posts
    1,392
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    its unicode problem Mr larry ulman has whole chapter dedicated for these issue in book php mysql for dynamic websites...
    for now may be u can check the coliation of mssql table...i last time sorted out this problem using a software that change mssql table to mysql

    but firefox has problem displaying unicode properly(at least for our language) it is not displayed correctly..i think its problem with firefox...

  4. #4
    SitePoint Enthusiast
    Join Date
    Aug 2008
    Posts
    37
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hey, thanks for the replies.

    No, it's definitely something between PHP and MsSql. I tried all of your suggestions before even posting here. I already mentioned that all of our output is encoded with UTF-8. The raw data that our database abstraction layer, ADO PEAR, returns the data in this format. Doesn't matter if it's displayed via browser or saved to a file.

    It isn't returning anything other than question marks and there is no way to translate them into something intelligible.

    Unfortunately, switching MySQL is NOT an option for a company this large. Even if we did, it doesn't really solve the issue, it just avoids it altogether.

    I've already tried changing the collation of the specified table; doesn't work.
    Changing the collation of the column that stores the text; doesn't work.
    I tried iconv; doesn't work.
    I tried a homebrewed transliteration process; doesn't work.
    I tried utf8_encode(); nada.
    UTF-8 meta tag in HTML; obvious first step, but no-go.
    Tried sending raw header to change character encoding to UTF-8; nope.
    Even changing the encoding via the browser's menu's and still nothing.
    Tried to connect to the database and execute this particular query using 'mssql_*()' functions rather than the abstraction layer and, wouldn't you know it, NOTHING!

    I've tested in every browser and attempted to display this data in it's proper format using almost every method imaginable.

    All I know is that ASP can simply run the same query and pull the data with NO problems. PHP with it's mssql extension, for some odd reason, cannot. I also did a bit of searching around and it seems a few others on the net have had the same issue and no easy solution seems to exist. For them anyways.

    I've had plenty of experience with double-byte characters in Oracle, Sybase and MySQL and have NEVER had a problem displaying the data properly with PHP.

    Double-You-Tee-Eff.... MATE? Argh.
    TheDrunkenEpic - my ramblings

  5. #5
    SitePoint Wizard bronze trophy
    Join Date
    Jul 2008
    Posts
    5,757
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Doesn't mssql use some windows uc2 char encoding or something like that? I bet php's library is not compatible and garbles it.

    Now, you say it works fine when asp pulls it from the db. Was this data also inserted with asp, or did php insert this data into the db, and asp still pulls it ok?

    Is there any way you could change the select to return the data in some other format? (I'm not really familiar with mssql) like, base64 encoded or some other encoding that won't have trouble surviving the transport into php. Of course then decode the string in php.

  6. #6
    SitePoint Enthusiast
    Join Date
    Aug 2008
    Posts
    37
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Yeah, I was discussing this with a colleague and we think we might be able to use casting within the stored proc on that particular column and change the collation on the fly. If not, I'll have to modify the data entry logic to store just the entities of the characters instead.

    But, that means all the old Thai data is unreadable. This change would not be retroactive.

    Merpflakes.

    Thanks for the help so far guys!
    TheDrunkenEpic - my ramblings

  7. #7
    SitePoint Enthusiast
    Join Date
    Aug 2008
    Posts
    37
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Ok, found the solution!!!

    I changed the collation of the specified column that stores the news item data from Latin1_blah_blah to Thai_CI_AS. When I pulled data directly from the table in PHP, I'd get a bunch of garbled characters. This was a good thing because they are no longer a shitload of '????????? ????? ??????'.

    I then ran a thai-to-utf8 conversion function I found in some newsgroup and BOOYAH, I got Thai.

    Now, I tried pulling from the stored proc like I was originally supposed to and I got back, again, question marks. I took a look at the stored proc and noticed that it's creating a temporary table for the news item data. The news item column in the temporary table ALSO needed to be collated to Thai.

    I did this and then ran the stored proc via PHP, got the data, converted it and SHAZAAM it worked!

    You'd think MS would have some sort of UTF8 or UTF16 collation support in SQL Server, but I couldn't find anything. Meh.

    Here's the function I used. Maybe it'll help others. Don't remember where I got it from. I was reading so many articles on the problem yesterday:

    PHP Code:
    function ThaiToUtf8($string)
    {
        if(
    false == preg_match('#[\241-\377]#i'$string))
        {
            return 
    $string;
        }

        return 
    strtr($string, array("\xa1" => "\xe0\xb8\x81",
                                    
    "\xa2" => "\xe0\xb8\x82",
                                    
    "\xa3" => "\xe0\xb8\x83",
                                    
    "\xa4" => "\xe0\xb8\x84",
                                    
    "\xa5" => "\xe0\xb8\x85",
                                    
    "\xa6" => "\xe0\xb8\x86",
                                    
    "\xa7" => "\xe0\xb8\x87",
                                    
    "\xa8" => "\xe0\xb8\x88",
                                    
    "\xa9" => "\xe0\xb8\x89",
                                    
    "\xaa" => "\xe0\xb8\x8a",
                                    
    "\xab" => "\xe0\xb8\x8b",
                                    
    "\xac" => "\xe0\xb8\x8c",
                                    
    "\xad" => "\xe0\xb8\x8d",
                                    
    "\xae" => "\xe0\xb8\x8e",
                                    
    "\xaf" => "\xe0\xb8\x8f",
                                    
    "\xb0" => "\xe0\xb8\x90",
                                    
    "\xb1" => "\xe0\xb8\x91",
                                    
    "\xb2" => "\xe0\xb8\x92",
                                    
    "\xb3" => "\xe0\xb8\x93",
                                    
    "\xb4" => "\xe0\xb8\x94",
                                    
    "\xb5" => "\xe0\xb8\x95",
                                    
    "\xb6" => "\xe0\xb8\x96",
                                    
    "\xb7" => "\xe0\xb8\x97",
                                    
    "\xb8" => "\xe0\xb8\x98",
                                    
    "\xb9" => "\xe0\xb8\x99",
                                    
    "\xba" => "\xe0\xb8\x9a",
                                    
    "\xbb" => "\xe0\xb8\x9b",
                                    
    "\xbc" => "\xe0\xb8\x9c",
                                    
    "\xbd" => "\xe0\xb8\x9d",
                                    
    "\xbe" => "\xe0\xb8\x9e",
                                    
    "\xbf" => "\xe0\xb8\x9f",
                                    
    "\xc0" => "\xe0\xb8\xa0",
                                    
    "\xc1" => "\xe0\xb8\xa1",
                                    
    "\xc2" => "\xe0\xb8\xa2",
                                    
    "\xc3" => "\xe0\xb8\xa3",
                                    
    "\xc4" => "\xe0\xb8\xa4",
                                    
    "\xc5" => "\xe0\xb8\xa5",
                                    
    "\xc6" => "\xe0\xb8\xa6",
                                    
    "\xc7" => "\xe0\xb8\xa7",
                                    
    "\xc8" => "\xe0\xb8\xa8",
                                    
    "\xc9" => "\xe0\xb8\xa9",
                                    
    "\xca" => "\xe0\xb8\xaa",
                                    
    "\xcb" => "\xe0\xb8\xab",
                                    
    "\xcc" => "\xe0\xb8\xac",
                                    
    "\xcd" => "\xe0\xb8\xad",
                                    
    "\xce" => "\xe0\xb8\xae",
                                    
    "\xcf" => "\xe0\xb8\xaf",
                                    
    "\xd0" => "\xe0\xb8\xb0",
                                    
    "\xd1" => "\xe0\xb8\xb1",
                                    
    "\xd2" => "\xe0\xb8\xb2",
                                    
    "\xd3" => "\xe0\xb8\xb3",
                                    
    "\xd4" => "\xe0\xb8\xb4",
                                    
    "\xd5" => "\xe0\xb8\xb5",
                                    
    "\xd6" => "\xe0\xb8\xb6",
                                    
    "\xd7" => "\xe0\xb8\xb7",
                                    
    "\xd8" => "\xe0\xb8\xb8",
                                    
    "\xd9" => "\xe0\xb8\xb9",
                                    
    "\xda" => "\xe0\xb8\xba",
                                    
    "\xdf" => "\xe0\xb8\xbf",
                                    
    "\xe0" => "\xe0\xb9\x80",
                                    
    "\xe1" => "\xe0\xb9\x81",
                                    
    "\xe2" => "\xe0\xb9\x82",
                                    
    "\xe3" => "\xe0\xb9\x83",
                                    
    "\xe4" => "\xe0\xb9\x84",
                                    
    "\xe5" => "\xe0\xb9\x85",
                                    
    "\xe6" => "\xe0\xb9\x86",
                                    
    "\xe7" => "\xe0\xb9\x87",
                                    
    "\xe8" => "\xe0\xb9\x88",
                                    
    "\xe9" => "\xe0\xb9\x89",
                                    
    "\xea" => "\xe0\xb9\x8a",
                                    
    "\xeb" => "\xe0\xb9\x8b",
                                    
    "\xec" => "\xe0\xb9\x8c",
                                    
    "\xed" => "\xe0\xb9\x8d",
                                    
    "\xee" => "\xe0\xb9\x8e",
                                    
    "\xef" => "\xe0\xb9\x8f",
                                    
    "\xf0" => "\xe0\xb9\x90",
                                    
    "\xf1" => "\xe0\xb9\x91",
                                    
    "\xf2" => "\xe0\xb9\x92",
                                    
    "\xf3" => "\xe0\xb9\x93",
                                    
    "\xf4" => "\xe0\xb9\x94",
                                    
    "\xf5" => "\xe0\xb9\x95",
                                    
    "\xf6" => "\xe0\xb9\x96",
                                    
    "\xf7" => "\xe0\xb9\x97",
                                    
    "\xf8" => "\xe0\xb9\x98",
                                    
    "\xf9" => "\xe0\xb9\x99",
                                    
    "\xfa" => "\xe0\xb9\x9a",
                                    
    "\xfb" => "\xe0\xb9\x9b"));

    TheDrunkenEpic - my ramblings

  8. #8
    SitePoint Wizard bronze trophy
    Join Date
    Jul 2008
    Posts
    5,757
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I sense extreme happiness

  9. #9
    SitePoint Enthusiast
    Join Date
    Aug 2008
    Posts
    37
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks for all the helps, yo!

    Awwwww yeeeeeeaaaaah!!!!
    TheDrunkenEpic - my ramblings

  10. #10
    SitePoint Wizard frank1's Avatar
    Join Date
    Oct 2005
    Posts
    1,392
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    ya i said it was collation u should look in to any way once again have a look in to how it is being displayed in Firefox...i guess not 100% correct...


Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •