SitePoint Sponsor

User Tag List

Results 1 to 11 of 11
  1. #1
    SitePoint Enthusiast
    Join Date
    Apr 2006
    Posts
    89
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Odd characters on top of page

    Hello All - would anyone know what would cause these characters to render on top of a page:


    



    Thanks in advance for looking

  2. #2
    SitePoint Guru silver trophy JamesColin's Avatar
    Join Date
    May 2009
    Location
    Jomtien, Pattaya, Thailand
    Posts
    910
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Here's what I've found on another forum:
    "
    Yes, "" is the Byte Order Mark (BOM) of the Unicode Standard. Specifically it is the hex bytes EF BB BF, which form the UTF-8 representation of the BOM, misinterpreted as ISO 8859/1 text instead of UTF-8.

    Probably what it means is that you are using a text editor that is saving files in UTF-8 with the BOM, when it should be saving without the BOM. It could be PHP files that have the BOM, in which case they'd appear as literal text on your page. Or it could be translated text you pasted into Joomla! edit windows.

    The Unicode Consortium's FAQ on the Byte Order Mark is at http://www.unicode.org/faq/utf_bom.html#BOM .
    "
    Do you really need traffic? Where to? What for?
    If you really do need traffic then stop messing around!
    Advertise on my sites today: She Told Me & Best Reviewer :
    200,000+ UV / Month

  3. #3
    Resident curmudgeon bronze trophy gary.turner's Avatar
    Join Date
    Jan 2009
    Location
    Dallas
    Posts
    990
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    You have saved your file as unicode, with its byte order mark (BOM). Open the file in your editor, and save as utf-8 (no BOM), with the same filename.

    cheers,

    gary
    Anyone can build a usable website. It takes a graphic
    designer to make it slow, confusing, and painful to use.

    Simple minded html & css demos and tutorials

  4. #4
    SitePoint Author silver trophybronze trophy

    Join Date
    Nov 2004
    Location
    Ankh-Morpork
    Posts
    12,158
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by gary.turner View Post
    You have saved your file as unicode, with its byte order mark (BOM). Open the file in your editor, and save as utf-8 (no BOM), with the same filename.
    That probably won't help. If the BOM is displayed, it's because the encoding is declared as something other than UTF-8. Probably ISO 8859-1 or Windows-1252 if it's a Western site. Re-saving it as UTF-8 w/o BOM won't help then (unless the text only contains US ASCII characters).

    The declared encoding must match the encoding under which the file was saved. Either changed the declared encoding or re-save the file with the encoding you have declared.
    Birnam wood is come to Dunsinane

  5. #5
    Resident curmudgeon bronze trophy gary.turner's Avatar
    Join Date
    Jan 2009
    Location
    Dallas
    Posts
    990
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by AutisticCuckoo View Post
    That probably won't help. If the BOM is displayed, it's because the encoding is declared as something other than UTF-8. Probably ISO 8859-1 or Windows-1252 if it's a Western site. Re-saving it as UTF-8 w/o BOM won't help then (unless the text only contains US ASCII characters).
    Tommy, if a BOM shows, the source must have been saved as unicode or utf-8+BOM, which is redundant, since utf-8 is self-describing.

    Now, the server or the document's meta http-equiv may be configured to set the char-set to something other than utf-8, but that's a different issue. There is no sane reason for the document to have a BOM in the first place.

    The declared encoding must match the encoding under which the file was saved. Either changed the declared encoding or re-save the file with the encoding you have declared.
    True, but the author must decide which encoding to use, and be sure that the server response header reflects that choice or says nothing about char-set, and the meta http-equiv be made to agree with the chosen encoding.

    Since a choice to use unicode or utf-8 seems to have been made, if without understanding, the OP should check that configurations are correct. There is no doubt in my mind that the decision to go utf-8 would be the correct one.

    cheers,

    gary
    Anyone can build a usable website. It takes a graphic
    designer to make it slow, confusing, and painful to use.

    Simple minded html & css demos and tutorials

  6. #6
    SitePoint Wizard Stomme poes's Avatar
    Join Date
    Aug 2007
    Location
    Netherlands
    Posts
    10,283
    Mentioned
    51 Post(s)
    Tagged
    2 Thread(s)
    Tommy, if a BOM shows, the source must have been saved as unicode or utf-8+BOM, which is redundant, since utf-8 is self-describing.
    I thought Tommy meant that you could save the document with a BOM and NOT see it (sometimes while viewing source in some browsers but not on the page), so that might indicate the OP wants a different charset anyway.

  7. #7
    SitePoint Author silver trophybronze trophy

    Join Date
    Nov 2004
    Location
    Ankh-Morpork
    Posts
    12,158
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Question

    Quote Originally Posted by gary.turner View Post
    Tommy, if a BOM shows, the source must have been saved as unicode or utf-8+BOM, which is redundant, since utf-8 is self-describing.
    Agreed. But if a BOM shows, that means the browser isn't being told that the encoding is UTF-8 (or doesn't support UTF-8). Either way, re-saving as UTF-8 without BOM is unlikely to be a satisfactory solution. Yes, it will get rid of the BOM, but there'll still be a lot of '' characters and suchlike (depending on which language the text is in).

    If the BOM is visible, the declared encoding isn't UTF-8. You then have two choices:
    1. declare the encoding as UTF-8 (optionally re-saving as UTF-8 without BOM)
    2. re-save using the declared encoding.


    The encoding that is used and the encoding that is declared must match.

    Quote Originally Posted by gary.turner View Post
    Now, the server or the document's meta http-equiv may be configured to set the char-set to something other than utf-8, but that's a different issue.
    It's very much the issue here. If it's declared as something other than UTF-8, re-saving as UTF-8 without BOM will not help. (Unless the text only contains US ASCII characters.)

    Quote Originally Posted by gary.turner View Post
    There is no sane reason for the document to have a BOM in the first place.
    Agreed. But sometimes a reasonable author may be forced to use an inferior text editor that insists on saving UTF-8 with a BOM.

    Quote Originally Posted by gary.turner View Post
    Since a choice to use unicode or utf-8 seems to have been made, if without understanding, the OP should check that configurations are correct. There is no doubt in my mind that the decision to go utf-8 would be the correct one.
    If the BOM is visible, it's highly likely that the choice is not to use UTF-8. Therefore re-saving without a BOM, but still as UTF-8, is unlikely to be very helpful. At worst it can lull the author into a false sense of security, if the text currently consists exclusively of characters in the US ASCII repertoire. Re-saving as UTF-8 w/o BOM could then appear to be successful
    … until the day when the author decides to add a dash, an ellipsis or some typographically correct quotation marks.
    Birnam wood is come to Dunsinane

  8. #8
    Resident curmudgeon bronze trophy gary.turner's Avatar
    Join Date
    Jan 2009
    Location
    Dallas
    Posts
    990
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I don't think we're in disagreement; maybe just a different emphas'is.
    Quote Originally Posted by AutisticCuckoo View Post
    <snip>

    The encoding that is used and the encoding that is declared must match.
    Not strictly true. If the server response header set utf-8, then ASCII and iso-8859-x are legitimate subsets, and are not an issue.

    It's very much the issue here. If it's declared as something other than UTF-8, re-saving as UTF-8 without BOM will not help. (Unless the text only contains US ASCII characters.)
    True, but if the server says something other than utf-8, that's a seriously fragile declaration; I'd even say a mal-configuration. It would be better to remain silent than to declare a non-utf-8 encoding.

    Agreed. But sometimes a reasonable author may be forced to use an inferior text editor that insists on saving UTF-8 with a BOM.
    Wow! I'd hate that. (Example?) On the other hand, all *nix servers are guaranteed to have (n)vi(m) that the author could use via ssh. I have no idea about Windows servers, and if I remain lucky, I won't have to bump up against it.

    If the BOM is visible, it's highly likely that the choice is not to use UTF-8. Therefore re-saving without a BOM, but still as UTF-8, is unlikely to be very helpful. At worst it can lull the author into a false sense of security, if the text currently consists exclusively of characters in the US ASCII repertoire. Re-saving as UTF-8 w/o BOM could then appear to be successful
    until the day when the author decides to add a dash, an ellipsis or some typographically correct quotation marks.
    This is still an authoring error. If the server declares an encoding other than utf-8, then the author should either change the document encoding, or change the server header. If the server does not declare an encoding, then it is the author's responsibility to do so in the document head. See Character encodings, declaring encodings.

    cheers,

    gary
    Anyone can build a usable website. It takes a graphic
    designer to make it slow, confusing, and painful to use.

    Simple minded html & css demos and tutorials

  9. #9
    SitePoint Author silver trophybronze trophy

    Join Date
    Nov 2004
    Location
    Ankh-Morpork
    Posts
    12,158
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by gary.turner View Post
    Not strictly true. If the server response header set utf-8, then ASCII and iso-8859-x are legitimate subsets, and are not an issue.
    That's incorrect. Yes, if all characters are in the US ASCII range, then you'll be safe. But using ISO 8859 with a declared encoding of UTF-8 will not work. You'll either get an 'invalid UTF-8 character' error or a lot of question marks if your ISO 8859-1 document contains the word 'blbrsgrt' and you declare the encoding to be UTF-8.

    ISO 8859-1 as a repertoire is a subset of ISO/IEC 10646 ('Unicode'). But from U+0080 to U+00FF the encoding differs from UTF-8.

    Quote Originally Posted by gary.turner View Post
    True, but if the server says something other than utf-8, that's a seriously fragile declaration; I'd even say a mal-configuration. It would be better to remain silent than to declare a non-utf-8 encoding.
    What? There are millions and millions of web sites that use ISO 8859-1 as the character encoding!

    Quote Originally Posted by gary.turner View Post
    Wow! I'd hate that. (Example?)
    Microsoft Notepad (at least older versions).

    Quote Originally Posted by gary.turner View Post
    On the other hand, all *nix servers are guaranteed to have (n)vi(m) that the author could use via ssh.
    Yes, but how many among the average Dreamweaver/Frontpage point-and-click designers would even know how to save a file in vim, let alone change the character encoding?

    Quote Originally Posted by gary.turner View Post
    This is still an authoring error. If the server declares an encoding other than utf-8, then the author should either change the document encoding, or change the server header.
    That's exactly what I've been saying.

    Quote Originally Posted by gary.turner View Post
    If the server does not declare an encoding, then it is the author's responsibility to do so in the document head. See Character encodings, declaring encodings.
    I'm aware of that. I've even written an article about character encoding for SitePoint.
    Birnam wood is come to Dunsinane

  10. #10
    SitePoint Wizard Stomme poes's Avatar
    Join Date
    Aug 2007
    Location
    Netherlands
    Posts
    10,283
    Mentioned
    51 Post(s)
    Tagged
    2 Thread(s)
    Yes, but how many among the average Dreamweaver/Frontpage point-and-click designers would even know how to save a file in vim, let alone change the character encoding?
    I'd argue that's as essential a skill belonging to "web developer" as knowing HTML and CSS.

    Off Topic:

    So was that a blueberry BUSH or blueberry MUSH? Just curious. You keep saying Dutch and Swedish are so close, but between not knowing who's a false cognate and just plain different root-stems, I can't make heads or tails of Swedish : ) Even if I'm listening to Familjen at the same time.

  11. #11
    SitePoint Author silver trophybronze trophy

    Join Date
    Nov 2004
    Location
    Ankh-Morpork
    Posts
    12,158
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Stomme poes View Post
    I'd argue that's as essential a skill belonging to "web developer" as knowing HTML and CSS.
    And we both know how well the aforementioned group masters those skills ...

    Off Topic:

    Quote Originally Posted by Stomme poes View Post
    So was that a blueberry BUSH or blueberry MUSH?
    Blueberry porridge.
    Birnam wood is come to Dunsinane


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •