SitePoint Sponsor

User Tag List

Results 1 to 11 of 11
  1. #1
    SitePoint Wizard Stomme poes's Avatar
    Join Date
    Aug 2007
    Location
    Netherlands
    Posts
    10,276
    Mentioned
    50 Post(s)
    Tagged
    2 Thread(s)

    Help me remove this invalid Unicode character!?

    Hallo gurus,

    I did a rebuild of a page, and decided I'd like a special bullet, one who was not an image... so I thought I'd be all sneaky and clever and use :before. But it's become my doom. Before switching to yet another image bullet (ug) maybe someone knows another way around this?

    I have a menu, and in place of bullets I have this:

    Code:
    #menu {
      margin: 1em 0;
    }
    	#menu li {
    	  margin-bottom: .3em;
    	  padding-left: 1em;
    	  font-size: 1em;
    	}
    	#menu li:before {
    	  content: "\00bb" " "; /*raquo*/
    	  color: #d1b248;
    	  font: .8em georgia, serif;
    	}
    	* html #menu li {display: block; width: 99%;}
    To get me the >> right angled quote character. I should prolly also test this in JAWS... it's possible that I'm still really adding content in which case, a decorative bullet shouldn't be content. But I've seen this technique done in forms before for a decorative "hey look here" image for error messages... so, not sure about that.

    As I understand it, CSS "content" requires special characters to be written in hex or in a code point??? And if I look here on Wikipedia I see the Unicode code point U+00BB. So I wrote it as you see above, and this is how I've seen it in the form I saw as well, since I can't actually write it in hex with the x.... Maybe there's a way to do it that I don't know, to actually make it just hex?

    And this validates HTML4 no problem. But I wanted to check the page through the W3 semantic extractor for teh Lawlz. Apparently it uses this XML parser, Xerxes, which I think is puking on that character (I'm not sure, but after some Googling other people with the problem with this parser were also using unicode code points instead of decimal character entities... so that's why I think my >> is the issue).

    Here is the error:
    Using org.apache.xerces.parsers.SAXParser
    Exception net.sf.saxon.trans.XPathException: org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0x1d) was found in the comment.
    org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0x1d) was found in the comment.
    I don't have THAT character anywhere I can tell 0x1d, but it seem to stand for any "control character" which, apparently, is everyone in the range of 0000 to 000something... this includes x00bb : ( I'm definitely not using it as a control character.

    So, before giving up and switching to an image (Yet Another GET Request is I guess my only reason for not doing the image...), is there some other equivalent hex code for this character? It has a very low ascii number.
    (187)

    Or, better yet, a page that can tell me valid hex equivalents of the decimals? I once found, long ago, a few unicode sites who wanted me to type the character in and then it could give me some other versions... but usually I can't type these characters in, lawlz, because they're not in my keyboard. I've always used decimals written out to make characters... even for the Euro symbol (there's a key on my keyboard, but it doesn't seem to do anything).

    Any Unicode gurus out there?

    Thanks,
    poes

  2. #2
    SitePoint Author silver trophybronze trophy

    Join Date
    Nov 2004
    Location
    Ankh-Morpork
    Posts
    12,158
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Stomme poes View Post
    Code:
    	#menu li:before {
    	  content: "\00bb" " "; /*raquo*/
    	  color: #d1b248;
    	  font: .8em georgia, serif;
    	}
    That's correct. You can shorten it to "\bb" if you like.

    Quote Originally Posted by Stomme poes View Post
    As I understand it, CSS "content" requires special characters to be written in hex or in a code point???
    You can use a literal '»' character as long as it's correctly encoded. (The '»' is available in most encodings you're likely to use, e.g., UTF-8, ISO 8859-1 and Windows-1252).

    You can also express it with a character escape as you've done.

    Character escapes in CSS consist of a backslash ('\') followed by 1-6 hexadecimal characters. A blank space after an escape is ignored, which let's you write "\bb !" to produce '»!' instead of having to write "\0000bb!".

    Quote Originally Posted by Stomme poes View Post
    I don't have THAT character anywhere I can tell 0x1d, but it seem to stand for any "control character" which, apparently, is everyone in the range of 0000 to 000something... this includes x00bb :
    No, U+00BB is not a control character. The C0 range of control characters is U+0000 to U+001F, and the C1 range is U+0080 to U+009F.

    This cannot have anything to do with your '»' character (which should be encoded as C2 BB in UTF-8). You must have a U+001D character somewhere in your source, or it may be some oddity with the software you're using.

    Try searching for it (you have vim, don't you?) in your source file(s). Since it's a control character you won't be able to see it (it's unprintable), but you should be able to search for it.
    Birnam wood is come to Dunsinane

  3. #3
    SitePoint Wizard Stomme poes's Avatar
    Join Date
    Aug 2007
    Location
    Netherlands
    Posts
    10,276
    Mentioned
    50 Post(s)
    Tagged
    2 Thread(s)
    You can use a literal '' character as long as it's correctly encoded. (The '' is available in most encodings you're likely to use, e.g., UTF-8, ISO 8859-1 and Windows-1252).
    Since I can't actually type it, I don't really dare to copy-pasta it... though I suppose I could try it.

    Actually, me being stupid, I should have commented that whole section out and then tried again with the Semantics Extractor to verify the problem was in that section... unfortunatley I just tried that and got this:
    is locally blacklisted
    arg! This sucks, my entire domain is suddenly blacklisted. I cannot test any of my pages! The ways to contact them are all under the subject of developing the extractor more : ( I wonder if my illegal character had set it off : (

    Try searching for it (you have vim, don't you?) in your source file(s). Since it's a control character you won't be able to see it (it's unprintable), but you should be able to search for it.
    I do have vim but I'm not sure how I search for something unprintable... more importantly, I'm not sure how I could have created something unprintable.. this is just a pure, static HTML file written in my text editor, which produces pages which have never made this problem before.

    *edit I wonder if there's another XML xerces parser out there I can use to check if I've gotten rid of it... but setting the list option in vi only shows my line ends ($)

    No, U+00BB is not a control character. The C0 range of control characters is U+0000 to U+001F, and the C1 range is U+0080 to U+009F.
    hm I forgot to thank you for this one, I had only run into a list of the c0 range on teh googles... good to know

    ...holy **** Google is fast picking these pages!!!
    Last edited by Stomme poes; Jul 28, 2009 at 02:19.

  4. #4
    SitePoint Author silver trophybronze trophy

    Join Date
    Nov 2004
    Location
    Ankh-Morpork
    Posts
    12,158
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Stomme poes View Post
    Since I can't actually type it, I don't really dare to copy-pasta it... though I suppose I could try it.
    In vim you can enter characters by code position.

    In insert mode, press Ctrl+V 187 to enter '' using the decimal value (187). This only works for code positions up to 255, I believe, and you need to type exactly three digits.

    In insert mode, press Ctrl+V u 00bb to enter '' using the hexadecimal value (BB). This works for any Unicode character. You need to type exactly four hex digits.

    Quote Originally Posted by Stomme poes View Post
    I do have vim but I'm not sure how I search for something unprintable...
    You can use escapes in your search pattern. /\%d29 <CR> to search for a character with code position 29 decimal, or /\%x1d <CR> to search for a character with code position 1D hexadecimal. (<CR> means 'press Enter').

    You'd probably spot it anyway, since such a character would show up as '^]' in a colour that's different from normal text.

    Quote Originally Posted by Stomme poes View Post
    more importantly, I'm not sure how I could have created something unprintable..
    You probably haven't. I think there's a glitch somewhere else.

    Quote Originally Posted by Stomme poes View Post
    hm I forgot to thank you for this one, I had only run into a list of the c0 range on teh googles... good to know
    The C1 range is just reserved in the ISO 8859 series (and Unicode). No standardised meaning is assigned to these characters, unlike those in the C0 range.

    Windows-1252 uses the C1 range for a number of useful characters (like dashes and curly quotes). There should be problems if you use Windows-1252 and declare the encoding as ISO 8859-1, since those character are actually invalid (reserved) in the ISO encoding. But since this is so common (because people blithely use Windows software without knowing what they're doing) browsers actually assume Windows-1252 when you declare ISO 8859-1. The W3C validator will warn you, though, e.g., if you accidentally use code position 151 for an em dash (U+2014).
    Birnam wood is come to Dunsinane

  5. #5
    Resident curmudgeon bronze trophy gary.turner's Avatar
    Join Date
    Jan 2009
    Location
    Dallas
    Posts
    990
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I do have vim but I'm not sure how I search for something unprintable... more importantly, I'm not sure how I could have created something unprintable.. this is just a pure, static HTML file written in my text editor, which produces pages which have never made this problem before.
    Could you have typed ^] (ctl-]), trying for a closing curly brace?

    See Unicode character map. Click the character you want, then click "make html" to get the numeric entity , which is of course decimal. Or you could copy/paste the character directly into your file.

    cheers,

    gary
    Anyone can build a usable website. It takes a graphic
    designer to make it slow, confusing, and painful to use.

    Simple minded html & css demos and tutorials

  6. #6
    SitePoint Author silver trophybronze trophy

    Join Date
    Nov 2004
    Location
    Ankh-Morpork
    Posts
    12,158
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by gary.turner View Post
    Could you have typed ^] (ctl-]), trying for a closing curly brace?
    Yes, that works too (in this case). Although I think you mean 'square bracket' (']') rather than 'curly brace' ('}').
    Birnam wood is come to Dunsinane

  7. #7
    SitePoint Wizard Stomme poes's Avatar
    Join Date
    Aug 2007
    Location
    Netherlands
    Posts
    10,276
    Mentioned
    50 Post(s)
    Tagged
    2 Thread(s)
    I did :set list and saw for sure there are only line-ends $.

    the HTML file I'm 100% sure it's clean.

    Since I'm not a Windows user (except for testing IE in VirtualBox) and I don't import from Word files or anything I don't have to worry about 1251 chars here. : ) I always suspect them when people's quotes become ?'s

    In any case, I can't go back and check my page again until my server gets unblocked. Thanks also for the vim help because I've only used ^v for search before, and that was always for actual strings : )

  8. #8
    Resident curmudgeon bronze trophy gary.turner's Avatar
    Join Date
    Jan 2009
    Location
    Dallas
    Posts
    990
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by AutisticCuckoo View Post
    Yes, that works too (in this case). Although I think you mean 'square bracket' (']') rather than 'curly brace' ('}').
    I was thinking of the original entry. Perhaps, were she trying to type "}", and hit <ctl> instead of <shift>. 'Twas just a thought.

    cheers,

    gary
    Anyone can build a usable website. It takes a graphic
    designer to make it slow, confusing, and painful to use.

    Simple minded html & css demos and tutorials

  9. #9
    SitePoint Author silver trophybronze trophy

    Join Date
    Nov 2004
    Location
    Ankh-Morpork
    Posts
    12,158
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by gary.turner View Post
    I was thinking of the original entry. Perhaps, were she trying to type "}", and hit <ctl> instead of <shift>. 'Twas just a thought.
    Ah. I see what you mean. I don't know what Dutch keyboards look like (or if she's even using one ), but on an American keyboard that could explain things. (On a Swedish keyboard ']' is AltGr+9 and '}' is AltGr+0, so it wouldn't quite apply.)
    Birnam wood is come to Dunsinane

  10. #10
    SitePoint Wizard Stomme poes's Avatar
    Join Date
    Aug 2007
    Location
    Netherlands
    Posts
    10,276
    Mentioned
    50 Post(s)
    Tagged
    2 Thread(s)
    Lord, I have no clue if my keyboard is Dutch or not. Maybe not, I don't have &#235; keys, but nobody in the office has those, and my alt key is worthless due to Linux hijacking it. Means I miss out on some GIMP and Inkscape commands too, even trying to use altgr : ( But like US keyboards I do have }] on the same key.

  11. #11
    SitePoint Author silver trophybronze trophy

    Join Date
    Nov 2004
    Location
    Ankh-Morpork
    Posts
    12,158
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    There's a good entry about keyboard layouts on Wikipedia. It seems as if you're not using a Dutch layout, and the article says they are uncommon and that Dutch users normally use the US layout.
    Birnam wood is come to Dunsinane


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •