SitePoint Sponsor

User Tag List

Results 1 to 18 of 18
  1. #1
    SitePoint Wizard DoubleDee's Avatar
    Join Date
    Aug 2010
    Location
    Arizona
    Posts
    3,531
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)

    Encoding Web Pages for UTF-8

    How do I encode (??) my web pages so they support UTF-8 ?


    Debbie

  2. #2
    billycundiff{float:left;} silver trophybronze trophy RyanReese's Avatar
    Join Date
    Oct 2008
    Location
    Whiteford, Maryland, United States
    Posts
    13,564
    Mentioned
    6 Post(s)
    Tagged
    0 Thread(s)
    Twitter-@Ryan_Reese09
    http://www.ryanreese.us -Always looking for web design/development work

  3. #3
    It's all Geek to me silver trophybronze trophy
    ralph.m's Avatar
    Join Date
    Mar 2009
    Location
    Melbourne, AU
    Posts
    23,620
    Mentioned
    413 Post(s)
    Tagged
    7 Thread(s)
    Quote Originally Posted by DoubleDee View Post
    How do I encode (??) my web pages so they support UTF-8 ?
    As Ryan said, you can specify an encoding in the meta tag, but it is really your server that decides the page encoding, so you need to find out what encoding is being sent to the browser and change it if that's not what you want. You can find out what encoding is being sent to the browser by the various dev tools supplied with each browser. The W3C HTML Validator will also indicate the server encoding.

  4. #4
    SitePoint Wizard DoubleDee's Avatar
    Join Date
    Aug 2010
    Location
    Arizona
    Posts
    3,531
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by ralph.m View Post
    As Ryan said,
    Ryan who?


    you can specify an encoding in the meta tag, but it is really your server that decides the page encoding, so you need to find out what encoding is being sent to the browser and change it if that's not what you want. You can find out what encoding is being sent to the browser by the various dev tools supplied with each browser.
    Would Firebug tell me that?

    Can you be a little more specific how this works and what I need to check?

    (I'm considering switching my web pages and database to UTF-8, but am starting to see that this is much more involved than I originally thought!!)

    Thanks,


    Debbie

  5. #5
    It's all Geek to me silver trophybronze trophy
    ralph.m's Avatar
    Join Date
    Mar 2009
    Location
    Melbourne, AU
    Posts
    23,620
    Mentioned
    413 Post(s)
    Tagged
    7 Thread(s)
    Quote Originally Posted by DoubleDee View Post
    Ryan who?
    Only one other person has replied to this thread, called RyanReece.

    Would Firebug tell me that?
    I don't think so. But in Firefox, navigate to your site and go to View > Character encoding, and the server encoding of the current page should have a tick beside it.

    Or just run the page through the validator and it will indicate the encoding too. It may be that your server is already set to UTF-8 anyhow, meaning you won't need to do anything. But if you do need to, the page I linked to shows several ways to change the encoding, including putting a line of PHP at the top of your pages or just adding a line to a .htaccess file.

  6. #6
    Robert Wellock silver trophybronze trophy xhtmlcoder's Avatar
    Join Date
    Apr 2002
    Location
    A Maze of Twisty Little Passages
    Posts
    6,316
    Mentioned
    60 Post(s)
    Tagged
    0 Thread(s)
    Off Topic:

    Debbie put Ryan on ignore hence why she missed the answer to the question.

  7. #7
    It's all Geek to me silver trophybronze trophy
    ralph.m's Avatar
    Join Date
    Mar 2009
    Location
    Melbourne, AU
    Posts
    23,620
    Mentioned
    413 Post(s)
    Tagged
    7 Thread(s)
    Off Topic:

    Quote Originally Posted by xhtmlcoder View Post
    Debbie put Ryan on ignore hence why she missed the answer to the question.
    Debbie who?

  8. #8
    billycundiff{float:left;} silver trophybronze trophy RyanReese's Avatar
    Join Date
    Oct 2008
    Location
    Whiteford, Maryland, United States
    Posts
    13,564
    Mentioned
    6 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by ralph.m View Post
    Only one other person has replied to this thread, called RyanReece.



    I don't think so. But in Firefox, navigate to your site and go to View > Character encoding, and the server encoding of the current page should have a tick beside it.

    Or just run the page through the validator and it will indicate the encoding too. It may be that your server is already set to UTF-8 anyhow, meaning you won't need to do anything. But if you do need to, the page I linked to shows several ways to change the encoding, including putting a line of PHP at the top of your pages or just adding a line to a .htaccess file.
    Who is RyanReece?
    Quote Originally Posted by xhtmlcoder View Post
    Off Topic:

    Debbie put Ryan on ignore hence why she missed the answer to the question.
    This isn't the first thread I've given the answer to, and due to her ignoring me, she doesn't get the answer because noone else felt the need to respond in her thread after me, and thus the thread (in her eyes) died without "anyone" responding.
    Twitter-@Ryan_Reese09
    http://www.ryanreese.us -Always looking for web design/development work

  9. #9
    SitePoint Wizard DoubleDee's Avatar
    Join Date
    Aug 2010
    Location
    Arizona
    Posts
    3,531
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Off Topic:

    Quote Originally Posted by xhtmlcoder View Post
    Debbie put Ryan on ignore hence why she missed the answer to the question.
    Kind of makes you wonder why he insists on continuing to post in my threads...


    Debbie

  10. #10
    SitePoint Wizard DoubleDee's Avatar
    Join Date
    Aug 2010
    Location
    Arizona
    Posts
    3,531
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by ralph.m View Post
    you can specify an encoding in the meta tag, but it is really your server that decides the page encoding, so you need to find out what encoding is being sent to the browser and change it if that's not what you want. You can find out what encoding is being sent to the browser by the various dev tools supplied with each browser. The W3C HTML Validator will also indicate the server encoding.
    All of my php pages start off like this...
    PHP Code:
    <?php 
        
    // Initialize a session.
        
    session_start();

        
    // Set current Script Name.
        
    $_SESSION['returnToPage'] = $_SERVER['SCRIPT_NAME'];
    ?>

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml">

    <head>
        <!-- HTML Metadata -->
        <title>Debbie</title>
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

    Is it that simple, or do I need to do more?


    Debbie

  11. #11
    It's all Geek to me silver trophybronze trophy
    ralph.m's Avatar
    Join Date
    Mar 2009
    Location
    Melbourne, AU
    Posts
    23,620
    Mentioned
    413 Post(s)
    Tagged
    7 Thread(s)
    Quote Originally Posted by DoubleDee View Post
    do I need to do more?
    Yes. You need to read the discussion and links above.

  12. #12
    billycundiff{float:left;} silver trophybronze trophy RyanReese's Avatar
    Join Date
    Oct 2008
    Location
    Whiteford, Maryland, United States
    Posts
    13,564
    Mentioned
    6 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by DoubleDee View Post
    Off Topic:



    Kind of makes you wonder why he insists on continuing to post in my threads...


    Debbie
    By off chance you take me off that list, you should be able to see my helpful advice/answers .
    Twitter-@Ryan_Reese09
    http://www.ryanreese.us -Always looking for web design/development work

  13. #13
    om nom nom nom Stomme poes's Avatar
    Join Date
    Aug 2007
    Location
    Netherlands
    Posts
    10,234
    Mentioned
    47 Post(s)
    Tagged
    1 Thread(s)
    Also how you save the document. Save some document as iso-1252 and then have the server say "hey this is utf-8" == happy happy fun times.

    They MUST agree with each other. The text editor who saves the document MUST save in the same encoding the server's HTTP header states. The browser meta tag is just a cute extra

  14. #14
    Non-Member bronze trophy
    Join Date
    Nov 2009
    Location
    Keene, NH
    Posts
    3,760
    Mentioned
    23 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Stomme poes View Post
    Also how you save the document. Save some document as iso-1252 and then have the server say "hey this is utf-8" == happy happy fun times.

    They MUST agree with each other.
    EXACTLY. If you save it from your editor as one type, then serve it from the server as another, bad things tend to happen with any characters outside the ASCII 7 set.

    Though honestly, that's part of why for English Language websites I only use ASCII7 and if fancy characters are needed, I use the named entities - because then it doesn't matter what character encoding the server is trying to send; true ASCII (characters 0..127) are the same in most every character encoding... the only legitimate reason to need more than that being the stupid 'styled quotes', or foreign languages.

    An easy way to make sure the server is sending UTF-8 is with .htaccess -- though ideally the server software should be configured directly for it, you can't guarantee all users will want (or even understand) it -- so servers as a rule continue to default to iso-8859 globally and then let users declare it themselves.

    Code:
    <FilesMatch "\.(htm|html|css|js|php)$">
       AddDefaultCharset UTF-8
       DefaultLanguage en-US
    </FilesMatch>
    If you're on an Apache or Apache compatible server, throw that on your .htaccess and you're good to go... assuming you saved the file from your editor as UTF-8 as well. You can also declare it from PHP by outputting the proper 'header' before you start echoing out markup.
    Code:
    <?php
      header('text/html; charset=utf-8');
    ?>
    Notice that's identical to what should be in your meta http-equiv="Content-Type"... because that's EXACTLY what http-equiv means. Both it and content-language are there so that should you be accessing the file directly or should the http headers be missing, the user agent can still make sense of things.

    ... and you want to save UTF-8 without the 'byte order mark' (BOM). A number of browsers (guess which ones) screw up if the BOM is present -- a two character code at the start of the file to say how UTF-8 handles 'long form' characters. In Notepad2 it's under file -> encoding; normal UTF-8 is labelled just "UTF-8", while the BOM prefixed version is called "with signature". Most editors let you set the encoding you are saving as in a similar manner.

    She dropped the BOM on me... baby... She dropped the BOM on me...

  15. #15
    Non-Member
    Join Date
    Feb 2012
    Posts
    892
    Mentioned
    10 Post(s)
    Tagged
    0 Thread(s)
    The difference between the ASCII encoding and UTF-8 encoding is the same on how a shipping company for small, same size, anonymous little boxes, decides to drop the packages for its customers.

    One package for one customers, that makes for ASCII. That is, if your customer is 'a', it will get one box, one encoded package (a small numeric value, that fits in a byte).

    Now, the bigger customers, like 'ă', they can't seem to fit their bigger stuff in just one little package, so it takes two or more little boxes, two or more bytes for them (bigger numeric values, that takes more bytes to store).

    The problem the shipping company has to sort out is how to distinguish among these little packages to correctly give 'a' just one little package and to give 'ă' more than one. Also, more importantly, where do the little packages for one customer start and where do they stop, since all the packages look the same and customers have different numbers of them.

    That problem has been sorted out by encoding, sorting out the packages.
    The same way we separate the luggage at the airport: every traveler chooses how many suitcases are his. That is, how many he puts in it at the start of the journey must be equal to how many he gets at the end.

    A sort of putting stops in the flow of boxes. Each encoding has different ways to put those stops in the bytes flow, and each encoding can handle specific luggage sizes: one suitcases for one, three suitcases for another.

    Another parallel: cars. Let's say that ASCII only handles minis while UTF-8 can also handle up to lorries.

    Another parallel: if the server and the browser where in a water gun fight, they'd talk in squirts. A little squirt for 'a', a longer squirt for 'ă'. When the water gun is empty, that's when it's the end of file.


    Now, what poes and Jason are trying to say, is that when it comes to files you create, you're actually the head of that shipping company, and you have to make all the decisions. You get to decide what the encoding is, hands on, by specifying this option for the files you create. I repeat, for the files you create.

    How do you do that? The same way you take control and specify the formats for the content in a word processor: using the options the editor of your choice gives you. You just have to know where to go to set the font as Arial instead of Times New Roman, meaning specifying UTF-8 instead of ASCII.

    For example:

    If you use Notepad++, you have the Encoding entry in the menu bar. If you use other editors and you can't find the encoding option, I'm sure we can help you.

    I thought you needed a little more insight on what's the encoding about:
    - files are streams of little anonymous small, same size boxes (bytes)
    - interpreting a stream of bytes: a way to select how many boxes (bytes) belong to a character, how many bytes it takes to hold the numeric value (encoding) of a character, and where those boxes start and where those boxes end.


    Finally, the part about being kind and letting the browser know you're sending it a file, a stream of bytes, you've decided to encode as UTF-8, that is covered by that meta declaration. But if you didn't actually took care on your part to create a file where you knowingly have the UTF-8 encoding set, that remains just a false declaration.

    So, to answer to your question, no, it's not that simple, and yes, you need to do more.

  16. #16
    SitePoint Guru bronze trophy TheRaptor's Avatar
    Join Date
    Jul 2011
    Location
    New York
    Posts
    710
    Mentioned
    40 Post(s)
    Tagged
    0 Thread(s)
    Off Topic:

    Quote Originally Posted by DoubleDee View Post
    Off Topic:

    Kind of makes you wonder why he insists on continuing to post in my threads...
    I'm not sure why he is on your ignore list to begin with, but he is a very knowledgeable guy that provides useful information. I'm sure you'd benefit from taking him off your block/ignore list.
    TheRaptor - Joe

  17. #17
    om nom nom nom Stomme poes's Avatar
    Join Date
    Aug 2007
    Location
    Netherlands
    Posts
    10,234
    Mentioned
    47 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by mitica
    One package for one customers, that makes for ASCII. That is, if your customer is 'a', it will get one box, one encoded package (a small numeric value, that fits in a byte).

    Now, the bigger customers, like 'ă', they can't seem to fit their bigger stuff in just one little package, so it takes two or more little boxes, two or more bytes for them (bigger numeric values, that takes more bytes to store).
    Creative way to put it!

  18. #18
    billycundiff{float:left;} silver trophybronze trophy RyanReese's Avatar
    Join Date
    Oct 2008
    Location
    Whiteford, Maryland, United States
    Posts
    13,564
    Mentioned
    6 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by TheRaptor View Post
    Off Topic:



    I'm not sure why he is on your ignore list to begin with, but he is a very knowledgeable guy that provides useful information. I'm sure you'd benefit from taking him off your block/ignore list.
    http://www.sitepoint.com/forums/showthread.php?835184-DIV-with-Padding&highlight=

    D
    ue to that thread.
    Twitter-@Ryan_Reese09
    http://www.ryanreese.us -Always looking for web design/development work


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •