SitePoint Sponsor

User Tag List

Results 1 to 3 of 3
  1. #1
    SitePoint Member
    Join Date
    Jun 2012
    Posts
    17
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Website Validation Problem - Special Characters in UTF-8 Charset

    Dear users,

    I am new to website design and just wanted to validate my first page with validator.w3.org.

    I am getting the following message

    "Sorry, I am unable to validate this document because on line 6 it contained one or more bytes that I cannot interpret as utf-8 (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication.

    The error was: utf8 "\x9F" does not map to Unicode"


    Line 6 looks in every html file of the website as follows:

    <title> Krebsk&#246;nig - Verein f&#252;r Edelkrebszucht und Wiederansiedlung des Edelkrebs &#40;Astacus Astacus&#41; </title>


    The whole doctype declaration looks as follows:

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml">


    <head>
    <title> Krebsk&#246;nig - Verein f&#252;r Edelkrebszucht und Wiederansiedlung des Edelkrebs &#40;Astacus Astacus&#41; </title>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
    <link href="style1.css" rel="stylesheet" type="text/css"/>
    </head>

    I would be really happy for advices what I have to change in the markup to make validation work. I think it is a problem with the coding of special characters within utf-8.

    Many thanks in advance !

  2. #2
    Robert Wellock silver trophybronze trophy xhtmlcoder's Avatar
    Join Date
    Apr 2002
    Location
    A Maze of Twisty Little Passages
    Posts
    6,316
    Mentioned
    60 Post(s)
    Tagged
    0 Thread(s)
    You'd have probably been better saving the file itself as UTF-8. Perhaps the error is actually on another line or there is a mismatch in encoding. The following: "Krebskönig - Verein für Edelkrebszucht und Wiederansiedlung des Edelkrebs (Astacus Astacus)" should work. The ü would be; &#252; and ö &#246; in decimal. The \x9F refers to neither of those two. Obviusly for the TITLE element both those two would need character escapes as in post #1.

  3. #3
    SitePoint Wizard Stomme poes's Avatar
    Join Date
    Aug 2007
    Location
    Netherlands
    Posts
    10,276
    Mentioned
    50 Post(s)
    Tagged
    2 Thread(s)
    I also always put my encoding line before anything else in the <head>, for the times when either the server does not send any charset headers (so the browser then checks your meta-charset line, and tries to match that with how you actually saved your document).

    You should not need to hand-write HTML character entity references for symbols such as (, ), or @.

    It is likely the text editor you are saving your document in is not saving correctly as UTF-8. It might also be adding a Byte Order Mark to the beginning of the document (this is usually a setting in the text editor so you could check to see if that's set to ON somewhere).

    Ideally all three things should match: your document saved correctly as utf-8; your server sending out utf-8 as the charset in its headers; and the meta tag. If they don't match, the server overrides the meta tag and the browser will try to use the server's suggested charset to read the document and might fail if the document instead is saved in another charset (usually though you then get a working page but goofy text characters).
    Also
    http://i.imgur.com/4J7Il0m.jpg


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •