SitePoint Sponsor

User Tag List

Results 1 to 6 of 6
  1. #1
    SitePoint Member
    Join Date
    Nov 2005
    Posts
    12
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    How to save text with proper encoding?

    Hoping someone can help, I'm having trouble with character encoding and how to save the file.

    For example, I used file_get_contents() and file_put_contents() to grab this page from Wired, http://www.wired.com/news/culture/0,71720-0.html and I saved it to http://commoncache.com/e0ff2c322032/

    When I view the original website, all of the non-standard characters show up ok. But when I view my copied version, I get all kinds of wierd characters.

    What should I be doing to properly save html like on the Wired page so that later on it would show normally if sent back to a browser, for example?

    There's lots of character encoding functions on php.net, like utf8_encode and utf8_decode, but neither of them really had the stuff I was looking for I think.

  2. #2
    SitePoint Wizard silver trophy kyberfabrikken's Avatar
    Join Date
    Jun 2004
    Location
    Copenhagen, Denmark
    Posts
    6,157
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    You need to set your server up to serve the file as UTF-8, since that's the encoding it's in.

  3. #3
    SitePoint Member
    Join Date
    Nov 2005
    Posts
    12
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    ah, now that makes sense. Im in a hosted environment. what would the .htaccess line read as?

  4. #4
    SitePoint Addict
    Join Date
    Jun 2005
    Posts
    262
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by kyberfabrikken View Post
    You need to set your server up to serve the file as UTF-8, since that's the encoding it's in.
    Interesting. I had to do the opposite. My apache httpd.conf file specified UTF-8, but I had to use the following in a .htaccess to get special characters to display properly.

    Code:
    # prevents conflicts with accent characters
    AddDefaultCharset Off

  5. #5
    SitePoint Wizard silver trophy kyberfabrikken's Avatar
    Join Date
    Jun 2004
    Location
    Copenhagen, Denmark
    Posts
    6,157
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I think that would be something like
    Code:
    AddCharset UTF-8 .html
    More info at: http://www.w3.org/International/ques...access-charset

  6. #6
    SitePoint Wizard silver trophy kyberfabrikken's Avatar
    Join Date
    Jun 2004
    Location
    Copenhagen, Denmark
    Posts
    6,157
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by champ View Post
    Interesting. I had to do the opposite. My apache httpd.conf file specified UTF-8, but I had to use the following in a .htaccess to get special characters to display properly.
    Charset issues can drive any programmer insane. If your document was in fact stored as ISO-8859-1, but the server sent them as UTF-8 you'd get a similar problem. If the document is a PHP-script, things gets even more complicated. PHP defaults to send data as ISO-8859-1, unless you explicitly utf8_encode it, and send a Content-Type header with charset=UTF-8. Furthermore, input data can also be in different encodings. Most (all?) browsers default to send data back (when submitting a form) in the same charset as the page was in. So if your page is served as UTF-8, the $_POST array will contain UTF-8 encoded data, which you need to explicitly decode with utf8_decode.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •