SitePoint Sponsor

User Tag List

Results 1 to 14 of 14
  1. #1
    SitePoint Enthusiast
    Join Date
    Jan 2005
    Location
    Israel
    Posts
    73
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    How to convert scripts to UTF-8

    Hi,

    I need to convert the theme and the language files of my Invision Power Board based forum from windows-1255 (cp-1255) to UTF-8 encoding, what is the right way to do it? Should I convert the php script files to UTF-8 too?

    How should I convert my mysql database to UTF-8? Currently I am using mysql v4.0.27, when I tried to move to newer version it messed up the encoding. One way I thought doing it was to backup the database and convert the .sql backup file to UTF-8, and then load it on mysql v4.1.

    What do you think? Need advice.

    Thanks.

  2. #2
    Follow Me On Twitter: @djg gold trophysilver trophybronze trophy Dan Grossman's Avatar
    Join Date
    Aug 2000
    Location
    Philadephia, PA
    Posts
    20,578
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    - Your database needs to be UTF8-encoded
    - Your database connections need to be UTF8-encoded (SET NAMES utf8)
    - You can't use any non-multibyte-safe string functions in your code
    - The encoding of your code files is not relevant to how PHP will handle strings during execution

  3. #3
    SitePoint Enthusiast
    Join Date
    Jan 2005
    Location
    Israel
    Posts
    73
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks,

    but can you recommend me any tool to convert multiple files to UTF-8?

  4. #4
    SitePoint Wizard triexa's Avatar
    Join Date
    Dec 2002
    Location
    Canada
    Posts
    2,476
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Dan Grossman View Post
    - You can't use any non-multibyte-safe string functions in your code
    How do you know for sure if a function is multibyte-safe?
    AskItOnline.com - Need answers? Ask it online.
    Create powerful online surveys with ease in minutes!
    Sign up for your FREE account today!
    Follow us on Twitter

  5. #5
    SitePoint Enthusiast
    Join Date
    Jan 2005
    Location
    Israel
    Posts
    73
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by triexa View Post
    How do you know for sure if a function is multibyte-safe?
    I don't, but I want to try.
    I've already seen this done on other forums, so I suppose it's possible without changing the code of the forum.

    Is there any win32 app that can convert multiple files to utf-8?

  6. #6
    Follow Me On Twitter: @djg gold trophysilver trophybronze trophy Dan Grossman's Avatar
    Join Date
    Aug 2000
    Location
    Philadephia, PA
    Posts
    20,578
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by triexa View Post
    How do you know for sure if a function is multibyte-safe?
    It starts with mb_ and is part of the multibyte string library.

  7. #7
    SitePoint Wizard triexa's Avatar
    Join Date
    Dec 2002
    Location
    Canada
    Posts
    2,476
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Dan Grossman View Post
    It starts with mb_ and is part of the multibyte string library.
    Ohh!

    What about something like, for example, mysql_[real_]escape_string?
    AskItOnline.com - Need answers? Ask it online.
    Create powerful online surveys with ease in minutes!
    Sign up for your FREE account today!
    Follow us on Twitter

  8. #8
    Follow Me On Twitter: @djg gold trophysilver trophybronze trophy Dan Grossman's Avatar
    Join Date
    Aug 2000
    Location
    Philadephia, PA
    Posts
    20,578
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by tounano View Post
    Is there any win32 app that can convert multiple files to utf-8?
    What would you use that for? Your code doesn't need to be UTF-8 (it just gets compiled to opcode), what your code *does* needs to be safe for UTF-8 data.

    It's all a mess 'til PHP 6

  9. #9
    SitePoint Wizard triexa's Avatar
    Join Date
    Dec 2002
    Location
    Canada
    Posts
    2,476
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Dan Grossman View Post
    What would you use that for? Your code doesn't need to be UTF-8 (it just gets compiled to opcode), what your code *does* needs to be safe for UTF-8 data.

    It's all a mess 'til PHP 6
    Ya but we've even greatly improved in PHP 5... and anything we create that isn't for ourselves / a single person, we have to almost always support PHP 4!
    AskItOnline.com - Need answers? Ask it online.
    Create powerful online surveys with ease in minutes!
    Sign up for your FREE account today!
    Follow us on Twitter

  10. #10
    SitePoint Enthusiast
    Join Date
    Jan 2005
    Location
    Israel
    Posts
    73
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks, but I think you didn't understand me.

    I know this function mb_convert_encoding, and if I wanted to do it using code I could do preg_replace("/([\xE0-\xFA])/e","chr(215).chr(ord(\${1})-80)",$something);

    But I need any application that can change an encoding of a file, or actually of multiplefiles. I'm familiar with iconv (for linux) but I need something win32 based, and that works on multiple files.

    Thanks.

  11. #11
    SitePoint Wizard silver trophy kyberfabrikken's Avatar
    Join Date
    Jun 2004
    Location
    Copenhagen, Denmark
    Posts
    6,157
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Dan Grossman View Post
    What would you use that for? Your code doesn't need to be UTF-8
    The HTML templates will need to be (I don't know Invision, but I assume that it uses templates).

    Quote Originally Posted by tounano View Post
    But I need any application that can change an encoding of a file, or actually of multiplefiles. I'm familiar with iconv (for linux) but I need something win32 based, and that works on multiple files.
    PHP Code:
    for ($ii 1$ll count($argv); $ii $ll; ++$ii) {
      
    $filename $argv[$ii];
      if (!
    is_file($filename)) {
        die(
    "Not a file: " $filename);
      }
      
    file_put_contents($filename,
        
    utf8_encode(
          
    file_get_contents($filename)));

    Edit:


    Woops - I just saw, that you stated that the input files were cp-1255, and not ISO-8859-1. The above code won't work then, but you could use mb_convert_encoding in lieu of utf8_encode.

  12. #12
    SitePoint Wizard triexa's Avatar
    Join Date
    Dec 2002
    Location
    Canada
    Posts
    2,476
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by tounano View Post
    Thanks, but I think you didn't understand me.

    I know this function mb_convert_encoding, and if I wanted to do it using code I could do preg_replace("/([\xE0-\xFA])/e","chr(215).chr(ord(\${1})-80)",$something);

    But I need any application that can change an encoding of a file, or actually of multiplefiles. I'm familiar with iconv (for linux) but I need something win32 based, and that works on multiple files.

    Thanks.
    But as Dan pointed out, WHY would you need to convert the encoding of a file? It gets compiled into code a computer can actually understand...?
    AskItOnline.com - Need answers? Ask it online.
    Create powerful online surveys with ease in minutes!
    Sign up for your FREE account today!
    Follow us on Twitter

  13. #13
    SitePoint Wizard triexa's Avatar
    Join Date
    Dec 2002
    Location
    Canada
    Posts
    2,476
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by kyberfabrikken View Post
    The HTML templates will need to be (I don't know Invision, but I assume that it uses templates).



    PHP Code:
    for ($ii 1$ll count($argv); $ii $ll; ++$ii) {
      
    $filename $argv[$ii];
      if (!
    is_file($filename)) {
        die(
    "Not a file: " $filename);
      }
      
    file_put_contents($filename,
        
    utf8_encode(
          
    file_get_contents($filename)));

    Does it even matter what encoding the file is? The server fetches the file and returns it... it is up to the client's web browser to handle the encoding?
    AskItOnline.com - Need answers? Ask it online.
    Create powerful online surveys with ease in minutes!
    Sign up for your FREE account today!
    Follow us on Twitter

  14. #14
    SitePoint Wizard silver trophy kyberfabrikken's Avatar
    Join Date
    Jun 2004
    Location
    Copenhagen, Denmark
    Posts
    6,157
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by triexa View Post
    Does it even matter what encoding the file is? The server fetches the file and returns it... it is up to the client's web browser to handle the encoding?
    Yes. The server tells the client, which encoding, the content is going to be in. It then sends the content. If the server (~the PHP script) has said that the content is UTF-8, it better send UTF-8 encoded data.

    Note that PHP will still read the UTF-8 encoded file, as if it's ISO-8859-1 encoded, but it doesn't matter, as long as the PHP-parts of the file doesn't use non-ASCII characters.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •