SitePoint Sponsor

User Tag List

Results 1 to 14 of 14
  1. #1
    SitePoint Addict
    Join Date
    Oct 2005
    Posts
    255
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Byte-Order Mark found in UTF-8 File

    i just got the following warning when trying to validate my website:

    Byte-Order Mark found in UTF-8 File.

    The Unicode Byte-Order Mark (BOM) in UTF-8 encoded files is known to cause problems for some text editors and older browsers. You may want to consider avoiding its use until it is better supported.

    this also has something to do with these funny characters i see in the top left corner of the browser as the page loads.

    can someone tell me how to remove the BOM. i dont have microsoft expression only dreamweaver.

    is there any other way to fix this error?

    Thank you

  2. #2
    Guru in training bronze trophy SoulScratch's Avatar
    Join Date
    Apr 2006
    Location
    Maryland
    Posts
    1,838
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Was it made from scratch in DW? When you save it out, there should be an option whether to include or not include a BOM. You could paste this into a new file and try resaving it.
    Cross browser css bugs

    Dan Schulz you will be missed

  3. #3
    SitePoint Enthusiast
    Join Date
    Nov 2008
    Posts
    65
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Most text editors have options in the save dialog related to character encoding and the BOM. Look for an option to save "without BOM", "without Byte Order Mark", or "without Unicode signature."

  4. #4
    SitePoint Addict
    Join Date
    Oct 2005
    Posts
    255
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    i'd never noticed that checkbox in the save window for "include unicode signature (BOM)" in dreamweaver!

    Thank you i will always check that now. not sure how it got ticked, probably a slip of the keyboard.

    Topic Solved!
    Last edited by mitcho; Jan 6, 2009 at 19:38.

  5. #5
    Programming Since 1978 silver trophybronze trophy felgall's Avatar
    Join Date
    Sep 2005
    Location
    Sydney, NSW, Australia
    Posts
    16,799
    Mentioned
    25 Post(s)
    Tagged
    1 Thread(s)
    The BOM is only needed where you use UTF-16 or UTF-32. UTF-8 always uses the same byte order for those characters that need more than one byte.
    Stephen J Chapman

    javascriptexample.net, Book Reviews, follow me on Twitter
    HTML Help, CSS Help, JavaScript Help, PHP/mySQL Help, blog
    <input name="html5" type="text" required pattern="^$">

  6. #6
    SitePoint Wizard Stomme poes's Avatar
    Join Date
    Aug 2007
    Location
    Netherlands
    Posts
    10,276
    Mentioned
    50 Post(s)
    Tagged
    2 Thread(s)
    And since this thread should appear in searches re BOM, Notepad users are pretty much stuck with a BOM in any UTF-8 pages they create. Saving the file in any other text editor (including Notepad++) would fix this.

  7. #7
    Resident curmudgeon bronze trophy gary.turner's Avatar
    Join Date
    Jan 2009
    Location
    Dallas
    Posts
    990
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    When saving the first time in Notepad, you're given the choice of character encodings, ANSI (ASCII), Unicode, Unicode big endian, and utf-8. Either of the Unicode choices will prepend the BOM. Use utf-8 as your choice.

    If you've already saved with the BOM, open the file and 'save as' to get a new shot at setting the proper encoding.

    cheers,

    gary

  8. #8
    billycundiff{float:left;} silver trophybronze trophy RyanReese's Avatar
    Join Date
    Oct 2008
    Location
    Whiteford, Maryland, United States
    Posts
    13,564
    Mentioned
    6 Post(s)
    Tagged
    0 Thread(s)

    cheers,

    gary
    KK5st from Devshed?

    On a, semi-related note, isn't UTF-8 the only required Byte-order required by XHTML?
    Twitter-@Ryan_Reese09
    http://www.ryanreese.us -Always looking for web design/development work

  9. #9
    SitePoint Wizard Stomme poes's Avatar
    Join Date
    Aug 2007
    Location
    Netherlands
    Posts
    10,276
    Mentioned
    50 Post(s)
    Tagged
    2 Thread(s)
    Ryan, yup that's the same Gary : )

    UTF-8 doesn't require a BOM at all. Setting a byte order lets computers know where the ones, tens, and hundreds places are in a number-- we write our numbers all the same way (is it little endian? Cause the end of the number is the ones place? I forget) but computers can go either way. 813 could be three hundred eighteen, if the BOM said that was how it could be read (except my example sucks cause they're numbers not bytes, but oh well).
    And when this is possible, then you need to tell them which way they should read the number. UTF-16 and anything higher requires a BOM. Lucky for us, we don't ever need to use UTF-anything higher than 8 : )

    XHTML doesn't require anything anyway-- unless it were real XHTML (XML) which does require UTF-something (XML 1.0 requires unicode). But fake XHTML could be in a Windows charset for all it cares, cause it's really just HTML anyway.

    Thanks Gary for the info, I thought people were getting the BOM when they were selecting utf-8, rather than just with unicode.

  10. #10
    Programming Since 1978 silver trophybronze trophy felgall's Avatar
    Join Date
    Sep 2005
    Location
    Sydney, NSW, Australia
    Posts
    16,799
    Mentioned
    25 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by gary.turner View Post
    When saving the first time in Notepad, you're given the choice of character encodings, ANSI (ASCII), Unicode, Unicode big endian, and utf-8. Either of the Unicode choices will prepend the BOM. Use utf-8 as your choice.

    If you've already saved with the BOM, open the file and 'save as' to get a new shot at setting the proper encoding.

    cheers,

    gary
    Since UTF-8 is one variant of Unicode the Unicode options presumable use either UTF-16 or UTF-32 (the other two unicode variants) with the two alternate orders in which the bytes that make up each character can occur.
    Stephen J Chapman

    javascriptexample.net, Book Reviews, follow me on Twitter
    HTML Help, CSS Help, JavaScript Help, PHP/mySQL Help, blog
    <input name="html5" type="text" required pattern="^$">

  11. #11
    SitePoint Author silver trophybronze trophy

    Join Date
    Nov 2004
    Location
    Ankh-Morpork
    Posts
    12,158
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by RyanReese View Post
    On a, semi-related note, isn't UTF-8 the only required Byte-order required by XHTML?
    I'm not sure I understand what you're asking, but I think you may be thinking of the fact that an XML parser is only required to support UTF-8 and UTF-16 (if I remember correctly).


    Quote Originally Posted by felgall View Post
    Since UTF-8 is one variant of Unicode
    That isn't strictly correct (you're comparing apples to oranges).
    Unicode (or, strictly speaking, the variant standardised as ISO/IEC 10646) is the character repertoire used with HTML and XML.
    UTF-8 is an encoding: a method for specifying Unicode code positions using 1, 2, 3 or 4 octets.

    So Unicode is the whole set of available characters, where every character has an index number (code position). UTF-8 is one of many ways to represent those code positions, numerically.

    US-ASCII (ISO 646), ISO 8859-1, Windows-1252, etc. are both repertoires and encodings. Since the repertoires are very limited, any character can be represented with a single octet.
    Birnam wood is come to Dunsinane

  12. #12
    billycundiff{float:left;} silver trophybronze trophy RyanReese's Avatar
    Join Date
    Oct 2008
    Location
    Whiteford, Maryland, United States
    Posts
    13,564
    Mentioned
    6 Post(s)
    Tagged
    0 Thread(s)
    I'm not sure I understand what you're asking, but I think you may be thinking of the fact that an XML parser is only required to support UTF-8 and UTF-16 (if I remember correctly).
    That's what I was thinking. I knew there was an X in there :}.
    Twitter-@Ryan_Reese09
    http://www.ryanreese.us -Always looking for web design/development work

  13. #13
    One website at a time mmj's Avatar
    Join Date
    Feb 2001
    Location
    Melbourne Australia
    Posts
    6,282
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    As others have said, the byte-order mark for UTF-8 is not necessary, and I would even recommend against it (for the same reason that the W3C does).

    Unfortunately getting rid of it may be tricky, depending on which text editor you use. If the text editor supports UTF-8 (which most do now) then you won't even see the byte-order mark in the file when you open it. You would have to rely on there being a menu item to choose between character encodings, one of which may be "UTF-8 (no byte-order mark)" or even just "UTF-8".

    As a quick and dirty hack you could try opening it in a program that does NOT support UTF-8 and then you'll see the byte order mark as three strange characters and you'll be able to just delete them. Take care that it doesn't mess up any special characters elsewhere in the page though. And I can't suggest a non-UTF-8 aware application off the top of my head, but most full featured text editors like Notepad++ or PSPad will allow you to switch between UTF-8 and other modes.

    The purpose of the byte-order mark is to make sure that your computer system is not reading every sequence of 2 bytes in the wrong order. However, that is not relevant to UTF-8, because UTF-8 is encoded to a stream of single bytes and thus it has no byte-order issues. The byte-order mark is therefore useless (except as a "hint" that the document uses UTF-8 encoding, which is unnecessary).

    Edit: oops, missed the fact that the OP's problem has already been solved. Oh well, looks like I wasn't the only one

    Quote Originally Posted by Stomme poes
    XHTML doesn't require anything anyway-- unless it were real XHTML (XML) which does require UTF-something (XML 1.0 requires unicode). But fake XHTML could be in a Windows charset for all it cares, cause it's really just HTML anyway.
    Technically, that's not completely correct - while some implementations may determine the default character encoding of an HTML and XML document differently in the absense of any indication, normally you would indicate the character encoding somewhere in the document. It is indeed legitimate to have ISO-8859-1 or even CP-1251 (microsoft) in an XML document as with HTML. The statement "XML 1.0 requires unicode" is misleading here. Both HTML and XML always represent characters internally by their Unicode code point, and all XML implementations need to be able to support UTF-8 and UTF-16, but that doesn't mean the document needs to use either of those encodings where the intended implementation supports others.
    [mmj] My magic jigsaw
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    The Bit Depth Blog Twitter Contact me
    Neon Javascript Framework Jokes Android stuff

  14. #14
    SitePoint Wizard Stomme poes's Avatar
    Join Date
    Aug 2007
    Location
    Netherlands
    Posts
    10,276
    Mentioned
    50 Post(s)
    Tagged
    2 Thread(s)
    Technically, that's not completely correct - while some implementations may determine the default character encoding of an HTML and XML document differently in the absense of any indication, normally you would indicate the character encoding somewhere in the document. It is indeed legitimate to have ISO-8859-1 or even CP-1251 (microsoft) in an XML document as with HTML. The statement "XML 1.0 requires unicode" is misleading here. Both HTML and XML always represent characters internally by their Unicode code point, and all XML implementations need to be able to support UTF-8 and UTF-16, but that doesn't mean the document needs to use either of those encodings where the intended implementation supports others.
    Thanks for that. It made me go back to XML and re-read it. Where I saw that processors needed to accept Unicode, I read that as XML needed to have Unicode set.
    : )


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •