SitePoint Sponsor

User Tag List

Results 1 to 4 of 4

Thread: About encodings

  1. #1
    SitePoint Addict joaquin_win's Avatar
    Join Date
    Jul 2005
    Location
    Venezuela
    Posts
    224
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Question About encodings

    Hi to everyone,

    I'm currently developing a web app. The encoding of the app (every file) is iso-8859-1. This app expects at some point to receive user files (usually they know less about encodings than I do so I don't want to ask them to save their files in a specific encoding). I have several questions.
    1. Do you recommend to change the app to utf-8?
    2. Is there an easy way to port this app to utf-8. (Database should also be ported to utf-8)
    3. What recommendations can you give me in that matter.
    4. When I upload utf-8 files to the app any special character in it are display incorrectly. Is this problem gonna be solved if I switch the app to utf-8.
    5. Can you point me any resource about this encoding issue.
    I think that is all for now. Hope I made my point clear.

    Thanks in advance,

    Joaquín

  2. #2
    SitePoint Wizard silver trophy kyberfabrikken's Avatar
    Join Date
    Jun 2004
    Location
    Copenhagen, Denmark
    Posts
    6,157
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by joaquin_win View Post
    Do you recommend to change the app to utf-8?
    That depends on your needs. iso-8859 covers a lot of characters. I believe that it covers any european language, so it may be enough. If you were to start a new application from scratch, I would definitely recommend using utf-8, but changing an existing app may be a lot of trouble.
    Quote Originally Posted by joaquin_win View Post
    Is there an easy way to port this app to utf-8. (Database should also be ported to utf-8)
    Not really. It would be nice with a checklist that you could run through, but I don't think anybody made one such.
    Quote Originally Posted by joaquin_win View Post
    When I upload utf-8 files to the app any special character in it are display incorrectly. Is this problem gonna be solved if I switch the app to utf-8.
    Yes and no. The problem is that your users have uploaded their file in a different charset than you use. If you use UTF-8 internally, but the files uploaded are ISO-8859, you're going to have discrepancy just as you have now with ISO-8859 internally and UTF-8 uploaded files. You need to convert the uploaded files to the same format as you use internally - This goes no matter which charset you use internally. Since you don't know which charset the uploaded file is in, your best bet is to try and guess at it. The mbstring extension has functions for doing this.

    Quote Originally Posted by joaquin_win View Post
    What recommendations can you give me in that matter.
    A better solution would be to get away from file uploads. I'm assuming that the files uploaded are text files of some sort? If that's the case, can't you use a text field (<textarea/> tag) and let your user type/paste their text into that? This way, you can make sure that the content submitted is indeed in UTF-8, and act accordingly.

    Quote Originally Posted by joaquin_win View Post
    Can you point me any resource about this encoding issue.
    http://www.phpwact.org/php/i18n/charsets is a very good read on the charset issues.

  3. #3
    SitePoint Addict joaquin_win's Avatar
    Join Date
    Jul 2005
    Location
    Venezuela
    Posts
    224
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thank you very much I'll read that and do some testing with mbstring. The idea of changing a file upload with a textarea could work.

    Thanks again

  4. #4
    SitePoint Wizard
    Join Date
    Oct 2005
    Posts
    1,834
    Mentioned
    5 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by kyberfabrikken View Post
    . If you use UTF-8 internally, but the files uploaded are ISO-8859, you're going to have discrepancy just as you have now with ISO-8859 internally and UTF-8 uploaded files. You need to convert the uploaded files to the same format as you use internally - This goes no matter which charset you use internally. Since you don't know which charset the uploaded file is in, your best bet is to try and guess at it. The mbstring extension has functions for doing this.
    Last edited by cheesedude; Dec 30, 2006 at 22:30. Reason: Put in another, more recent topic.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •