SitePoint Sponsor

User Tag List

Results 1 to 6 of 6
  1. #1
    SitePoint Addict
    Join Date
    Jan 2007
    Location
    Romania
    Posts
    203
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    problems with characters like , , and others

    Hello,

    I have a problem with characters like , , when I insert strings in a MySQL Database table. To get those strings I parse an XML file, insert them in DB and when trying to display I get "Altm??nster" (collation ascii_general_ci) or Altmünster (collation latin1_general_ci). If I try to insert the same string, Altmnster, from SQLyog it is stored and displayed correct. So, it can be something from php? If I print the same string from php after parsing the xml I get the same strage output. If is from php, why is different when changing collations?

  2. #2
    SitePoint Addict
    Join Date
    Jan 2007
    Location
    Romania
    Posts
    203
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I found why this is happening: in the xml no encoding is specified. How can I resolve it?

  3. #3
    SitePoint Wizard silver trophy kyberfabrikken's Avatar
    Join Date
    Jun 2004
    Location
    Copenhagen, Denmark
    Posts
    6,157
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    PHP's XML parser uses UTF-8 as internal encoding. You have to use utf8_decode on all data which you retrieve from XML-elements.

  4. #4
    SitePoint Addict
    Join Date
    Jan 2007
    Location
    Romania
    Posts
    203
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks, it works if I decode data using utf8_decode().

    So it has no connection with specifying or not encoding in xml file?

  5. #5
    SitePoint Wizard silver trophy kyberfabrikken's Avatar
    Join Date
    Jun 2004
    Location
    Copenhagen, Denmark
    Posts
    6,157
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    No. When PHPs XML parser reads a document, it converts it into UTF-8 in-memory, regardless of what format the file was in originally. I would assume that if you don't specify an encoding, the parser would default to UTF-8, although I'm not a 100 percent sure about that.

  6. #6
    Non-Member
    Join Date
    Jan 2003
    Posts
    5,748
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    You may find this interesting...

    http://minutillo.com/steve/weblog/20...-and-data-loss

    If you want my advice, that would be if the XML document in question doesn't come with an encoding, such as an untrusted source then refuse it if you can; If it's from a trusted source, then make a noise about there not being an encoding.

    Stamp your feet if you have to...


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •