SitePoint Sponsor

User Tag List

Results 1 to 8 of 8
  1. #1
    SitePoint Wizard Busch's Avatar
    Join Date
    Jan 2004
    Posts
    1,072
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    html_entity_decode Not Working For Foreign Text

    When a user submits data on my site, i use htmlentities() to convert ", ', <, >, etc into html entities. And if the user wants to edit their submissions i am using html_entity_decode() to render back into a readable form. Everything works fine unless some Korean text is entered. (My site must be able to deal with Korean text). If their is Korean the characters are not converted back and look like this & #12615;& #12629; (spaces added so text is not converted)

    What functions can i use to make sure that the data can properly and safely be inserted into the database and be displayed back into a form for editting purposes with all Korean characters appearing as they should?

  2. #2
    SitePoint Enthusiast asp.da's Avatar
    Join Date
    Nov 2004
    Location
    forest
    Posts
    39
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hello Busch,

    I don't know if this makes sence to you, I didn't deal with Korean characters, but had some experience with other non-English languages. And found that if you just display in HTML those HTML entities, they are displayed by browser as proper characters of the appropriate language - of course if the browser is set to display them in terms of encoding. Just put those & #12615;& #12629; as a value of, say, a text field and see what the browser will show.

  3. #3
    SitePoint Wizard Busch's Avatar
    Join Date
    Jan 2004
    Posts
    1,072
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by asp.da
    Just put those & #12615;& #12629; as a value of, say, a text field and see what the browser will show.
    I did put them in a text field and it didn't work but when i echo them t the screen outside of a text field, it work displays the characters properly. That's what's so weird.

    BTW, i also tried htmlspecialchars but no luck...

  4. #4
    SitePoint Enthusiast asp.da's Avatar
    Join Date
    Nov 2004
    Location
    forest
    Posts
    39
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Yeah, strange. I tried cyrillic characters which look when encoded like Солн and they a shown both ways...

    Did you specify the charset using htmlspecialchars? The prototype is

    string htmlspecialchars ( string string [, int quote_style [, string charset]]).

    However in PHP manual they list the charsets which are supported by htmlspecialchars and htmlentities and there is no Korean there...

  5. #5
    SitePoint Enthusiast asp.da's Avatar
    Join Date
    Nov 2004
    Location
    forest
    Posts
    39
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Here is another idea. What if the problem is in the ability of the form in which the textfield is, to display the Korean characters? If you see them echoed without the text field means the browser can handle them.

    There is a form attribute accept-charset. What if you set it to Korean charset, which might be iso-ir-149 (as given at http://www.iana.org/assignments/character-sets)

  6. #6
    SitePoint Enthusiast
    Join Date
    Feb 2003
    Location
    Leuven, Belgium
    Posts
    78
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    fwiw Derick Rethans has given a talk on the issue of multilingual development at a php conference which is avaliable on his site: http://derickrethans.nl/files/wereld...nd-ffm2004.pdf . He shows how things such as strlen() might not work in multi-lingual environments, and how you can use iconv functions.

  7. #7
    SitePoint Wizard Busch's Avatar
    Join Date
    Jan 2004
    Posts
    1,072
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by asp.da
    Here is another idea. What if the problem is in the ability of the form in which the textfield is, to display the Korean characters? If you see them echoed without the text field means the browser can handle them.

    There is a form attribute accept-charset. What if you set it to Korean charset, which might be iso-ir-149 (as given at http://www.iana.org/assignments/character-sets)
    That seems like exactly what i need! I'll check this out and see if this solves the problem. it seems like it should work. i'll let you know how it goes. signing off for now... thanks for the info!

  8. #8
    SitePoint Wizard Busch's Avatar
    Join Date
    Jan 2004
    Posts
    1,072
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    still no luck. i got this error: Warning: html_entity_decode(): charset `RFC1557' not supported, assuming iso-8859-1. i also tried many different values for the charset, not just this one.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •