XML declaration in HTML/XHTML

Stomme_poes · July 25, 2011, 1:45pm

the spec (XHTML 1.0) does not dictate that an agent has to display or break on error.

Well now I’m completely confused.
XHTML1.0 spec says

A conforming user agent must meet all of the following criteria:

In order to be consistent with the XML 1.0 Recommendation [XML], the user agent must parse and evaluate an XHTML document for well-formedness…

A user agent who is conforming is one who parses XHTML under XML rules. Anyone who doesn’t do this is then not-conforming. So IE isn’t conforming.
So it says a well-formedness error is a fatal error.
So I look at XML rules regarding fatal errors and it says

Definition: An error which a conforming XML processor MUST detect and report to the application. After encountering a fatal error, the processor MAY continue processing the data to search for further errors and MAY report such errors to the application. In order to support correction of errors, the processor MAY make unprocessed data from the document (with intermingled character data and markup) available to the application. Once a fatal error is detected, however, the processor MUST NOT continue normal processing (i.e., it MUST NOT continue to pass character data and information about the document’s logical structure to the application in the normal way).

Are you saying that because XHTML1.0 says what a conforming UA is, that it doesn’t say anything about UAs breaking on error because they might not be conforming? That IE doesn’t have to stop when it hits an error because it’s not conforming in the first place?

Because if it is conforming, then it’s parsing with an XML parser; and if it’s parsing with an XML parser, then it’s stopping when it hits a fatal error (you would call this “breaking on error” correct?).

This makes absolutely no sense to me.

Either way, IE 9 conforms to XHTML 5 not previous versions of XHTML.

What’s the difference regarding parsing following XML rules? How can you be a conforming UA to “XHTML5” and not conforming to “previous versions of XHTML” if both say you’re conforming if you parse with an XML parser and follow XML parsing rules??

xhtmlcoder · July 25, 2011, 1:58pm

It’s saying if you use a XML Processor that follows the [XML 1.0] rules then it follows the the well-formed constraint of XML and must process the data a such (if served with correct MIME, etc).

Thus IE9 should obey well-formedness; so it doesn’t matter if [IE9] uses Fred or the normative XHTML because both versions follow the XML 1.0 Recommendations for parsing well-formed markup and that will suffice for our simple FATAL ERROR - well-formedness violation test suite.

C_Ankerstjerne · July 25, 2011, 2:24pm

I tried changing the DOCTYPE to XHTML5, which does work:
XHTML Test Page parses as text/html in Internet Explorer 9
XHTML Test Page parses as application/xhtml+xml in Internet Explorer 9

This is, to put it mildly, absolute rubbish.

system · July 25, 2011, 2:30pm

Try opening them in Opera

PARENT is not a valid XHTML tag unless you’re in 1.1 – XHTML 5 is based on 1.0, kinda… which means only HTML 4.01/HTML5 tags are supposed to be present.

C_Ankerstjerne · July 25, 2011, 2:39pm

Fixed

It’s just never enough for you, is it?

gary_turner · July 25, 2011, 2:47pm

Christian,

Using your first test case, above, the following was the end of all rendering in IE9. Have I misunderstood your conclusion? This appears to indicate IE9 is xhtml+xml conforming.

If your browser displays any contents below this paragraph, or cannot display this page at all, it means your browser is not compatible with XHTML.
And, the svg smiley face does render.

FF, of course, shows only an error message.

cheers,

gary

C_Ankerstjerne · July 25, 2011, 2:54pm

I can see that removing the erroneous parent element, it does work. Strange, as unknown elements should still be ignored. Anyway, this brings us to:

[list][]XHTML 1.0, sent as application/xhtml+xml works
[]XHTML 1.0, sent as application/xml works
[]XHTML 1.0, sent as text/html fails
[]XHTML5, sent as application/xhtml+xml works
[]XHTML5, sent as application/xml works
[]XHTML5, sent as text/html fails[/list]

Since desc works properly as an alternate text, this will make a valid case for beginning to use XHTML when Internet Explorer 9 becomes a little more popular, assuming vector graphics are needed.

logic_earth · July 25, 2011, 4:42pm

(X)HTML 5 have a set defined error-handling method. That is what makes it different. There are no fatal errors in (X)HTML5. Those are the specs IE 9 follows.

Also, IE does stop the parser, it doesn’t move past the point of the error. But the specs do not say IE has to show a message or stop rendering of what it has.

system · July 25, 2011, 6:47pm

STILL bombing in Opera, and it’s not strange. Opera processes XHTML as XML the moment it sees the “prologue”, so invalid tags make it bomb – as per the rules for XML.

Which is why it’s still bombing, you can’t say “SVG” either. SVG has to be included via OBJECT or some other means to be used in an HTML document – regardless of doctype.

Is there some browser where it’s actually rendering that SVG? Because there shouldn’t be. Or should I say, there shouldn’t be because the change to the HTML 5 Draft that allows for it is too new for anyone to have implemented it… excepting perhaps FF which is who originated the proposed change.

See “the problem with deploying or even testing DRAFT”.

– edit – wait, even with the 5 spec on the table, “D” is an invalid/nonsexistant attribute for SVG. In fact, NONE of the attributes you are trying to use exist under 5… or SVG 1.1 for that matter.
http://www.w3.org/TR/SVG/struct.html#SVGElement

Stomme_poes · July 26, 2011, 7:00am

As I understand it, they’re currently going for HTML5’s no-need-for-SVG-namespace setup only dealing with a subset of SVG: SVGTiny. So I thought all we’re expecting just a subset of SVG to work cross-browser in this manner.

Which is why it’s still bombing, you can’t say “SVG” either. SVG has to be included via OBJECT or some other means to be used in an HTML document – regardless of doctype.

They’re going for
<svg>
blah blah blah, no namespace… (also with MathML)
</svg>
as a Plain Old Tag… the browser/UA is supposed to “know” that it belongs to the SVG namespace simply because that’s the name of the tag. I stumbled into a guy who is the one to follow if you want to keep track of who’s doing what with SVG… Erik Dahlström.

Then why is there an X in the name if it doesn’t use an XML parser but a modified HTML (HTML5) parser? If the document is sent as application/xhtml+xml it’s supposed to be parsed like XML, whether someone is calling it HTML5 or Fred or anything else. So I am still quite confused there: XML parsing rules haven’t changed, and XML parsing has fatal errors. What am I missing?

Also, IE does stop the parser, it doesn’t move past the point of the error.

Good.

But the specs do not say IE has to show a message

Correct, they don’t

or stop rendering of what it has.

You mean not show what it has already parsed without error.
However your original text was “don’t have to … break on error”. This was what confused me. So if you are only saying IE doesn’t have to show an error and may show everything it parsed up to the error, then I think I’m clear on that part, thanks.

Stomme_poes · July 26, 2011, 8:41am

@logic_earth:
ah, did you mean XML5?

The goal of this project is develop a proposal for XML5. A revised of XML that no longer follows the well-formedness principle but instead has a way of error handling that is closer to HTML, except that it is more predictable. This new version will be backwards compatible with XML 1.0 and XML 1.1 in a way that documents written in those languages (and don’t rely on external DTD validation) will still work in XML5. The other way around is not necessarily true of course.

The idea is that having a non “draconian” version of XML would allow it be adopted more easily on the web as people tend to generate content using string concatenation which makes it very hard to guarantee “well-formed XML”. It also allows for better competition in the mobile space where XML content is not always parsed using an XML parser by all players.

So far as I know though this isn’t implemented anywhere yet.

C_Ankerstjerne · July 26, 2011, 10:59am

deathshadow
XHTML allows other XML-based languages to be used in conjunction with the XHTML, as long as their namespace is defined. You can see an example in the specification.

In the document, I enable the qualified name svg, so that I can use the svg: prefix, with the reference to the SVG namespace (xmlns[u]:svg[/u]="http://www.w3.org/2000/svg"). Having done this, any tag with the svg: prefix will be treated as the respective SVG tags without the prefix, so that e.g. svg:desc becomes desc when used as SVG.

For the sake of argument, and to demonstrate that the browser is actually interpreting the namespace correctly, I swapped the prefix svg: with charlietheunicorn:. This still gives the correct rendering of the SVG image in Chrome (and it serves to confuse my enemies :p).

I suspect that Opera may be tripping over the syntax error, so I’ve made a version without the error.

logic_earth · July 26, 2011, 1:43pm

Where do you get XML5? I’m talking about HTML5 that thing everyone is talking about You know that HTML5 thing?

Stomme_poes · July 27, 2011, 7:09am

Where do you get XML5? I’m talking about HTML5 that thing everyone is talking about You know that HTML5 thing?

Because it’s the only thing I can find on teh entire freaking interwebs that matches what you’ve been saying. XHTML5 must be parsed as XML, not some other special new manner of parsing, sorry. What you’re talking about sounds an awful lot like this little project of Anne’s that’s been going on since 2007, which is XML without draconian error handling (and therefore without fatal errors). However it does not exist in reality (yet).

Yes, there are, since it must be parsed as XML and XML has fatal errors. Or, you could explain what you’re talking about so I can get up to speed. What you mention about unified error handling is HTML parsing only. The WHATWG spent 5 years documenting how browsers parsed HTML (not XML or XHTML), and in 2009 they could release what they had documented along with a new parsing method with defined (and unified) error handling, which previously did not exist in HTML at all. But this has nothing to do with XHTML-anything, which is still parsed with an XML parser.

Enlighten me please.

xhtmlcoder · July 27, 2011, 11:53am

For both XML 1.0 and XML 1.1 the validating and non-validating XML processors alike (the latter being typically seen in mainstream browsers) they MUST report violations of the XML 1.x specifications well-formedness constraints.

From what I understand Fred (XML syntax) version [XHTML5] mainly uses the rules of 1.0 and it MUST follow XML well-formedness constraints: http://www.w3.org/TR/xml/

logic_earth · July 27, 2011, 3:46pm

Stomme poes, Section 8, and Section 9.
Pay close attention to “8.2.8 An introduction to error handling and strange cases in the parser” which applies to both HTML 5 and XHTML 5, all the rules that apply to HTML 5 parsing apply to XHTML 5 as well.

When an XML parser reaches the end of its input, it must stop parsing, following the same rules as the HTML parser. An XML parser can also be aborted, which must again by done in the same way as for an HTML parser.

Stomme_poes · July 28, 2011, 2:17pm

The URL you posted only takes me to the top of the page, but I did scroll down to Section 8.2.8.

I still am not convinced that Section 8.2.8 applies to XHTML5, rather than simply the small part at the top about what a UA must do after stopping with parsing.

Section 9 discusses XHTML which is parsed as XML. I see no contradiction there from any of the other sources I’ve referenced. I see nowhere that XHTML must use an HTML parser rather than an XML parser?

The entirety of your earlier quote is

w3c:

When an XML parser creates a script element, it must be marked as being “parser-inserted” and its “force-async” flag must be unset. If the parser was originally created for the XML fragment parsing algorithm, then the element must be marked as “already started” also. When the element’s end tag is parsed, the user agent must provide a stable state, and then prepare the script element. If this causes there to be a pending parsing-blocking script, then the user agent must run the following steps:

Block this instance of the XML parser, such that the event loop will not run tasks that invoke it.

Spin the event loop until the parser’s Document has no style sheet that is blocking scripts and the pending parsing-blocking script’s “ready to be parser-executed” flag is set.

Unblock this instance of the XML parser, such that tasks that invoke it can again be run.

Execute the pending parsing-blocking script.

There is no longer a pending parsing-blocking script.

Since the document.write() API is not available for XML documents, much of the complexity in the HTML parser is not needed in the XML parser.

Certain algorithms in this specification spoon-feed the parser characters one string at a time. In such cases, the XML parser must act as it would have if faced with a single string consisting of the concatenation of all those characters.

When an XML parser reaches the end of its input, it must stop parsing, following the same rules as the HTML parser. An XML parser can also be aborted, which must again by done in the same way as for an HTML parser.

For the purposes of conformance checkers, if a resource is determined to be in the XHTML syntax, then it is an XML document.

I could be wrong, but it seems to me that “end of its input” is not referring to errors but the end of the document, or whenever there is nothing left to parse. By “following the same rules as the HTML parser” it is only talking about the steps a user agent must follow in order to signal that a document is “ready” and “loaded”, which does not translate to “XHTML5 gets parsed with an HTML parser and therefore does not have fatal errors”. That they “ready” and “load” a document following the same steps is not the same as “XHTML5 parses using an HTML parser” and none of the parsing rules in Section 8 pertain to XHTML5, since XHTML5 is using an XML parser. This is directly the opposite of the claim made by the WHATWG.

XHTML syntax and parsing from the official WHATWG wiki states that for XHTML:

XML parsing rules are used. There is only one mode.

and

Well-formedness errors are fatal

Note that near the top it says

Please note that the information in here is based upon the current spec for (X)HTML5. Some of the issues technically do not apply to previous versions of HTML.

and it mentions in several places (just as W3C’s HTML5 page) that these documents are moving targets and information may change or be outdated. However today I am looking at them and see nowhere anything about XHTML5 being parsed with an HTML parser or following the new HTML5 parsing rules, which apply to HTML(5) and not X-anything.

That they both use the same steps after parsing to set the document to “ready” does not convince me that XHTML5 does not use an XML parser and therefore does not have fatal errors.

Everything I read states XHTML is parsed with an XML parser, which means well-formedness errors are fatal errors, which means when the parser hits them it must stop further parsing of the document, instead of going on and switching to error-parsing or HTML parsing or new HTML5 parsing.

I’ve been wrong often enough that I am more than willing to ask anyone working on HTML5 which is correct: if XHTML5 is parsed with an XML parser and follows all the rules of XML parsing including abortion of parsing upon well-formedness errors (fatal errors), or if it uses the new parser for HTML5 which does not have fatal errors.

*edit I’ve sent a question to Anne, tho he may direct me to someone with more knowledge on this part of the spec

system · July 28, 2011, 5:00pm

I REALLY have to say you’re mis-reading the intent and purpose of that entire section – SP really hit the key points.

I think such misunderstandings just further illustrates the problems with all of the HTML specifications; it’s this vague hard to interpret legalese that takes something which should be moronically simple to understand and implement, and turns it into just another convoluted mess that two people can read and interpret to have two radically different meanings.

Though BOTH the SGML and XML legacies certainly don’t help in that department - as both specifications are also needlessly complex for no good reason filled with tons of things nobody should ever need or even care about.

I often think it would be better if all the ‘extra crap’ from both specifications that aren’t needed to actually build a website was just dropped altogether from HTML.

Stomme_poes · July 29, 2011, 6:42am

I linked both Bruce and [url=http://twitter.com/annevk/status/96597186657271808]Anne to the thread but both just answered the basic question: XHTML5 must be parsed as XML with an XML parser.

Boy I feel like a retard linking to tweets. But anyway I did send a follow-up question to Anne regarding the end-of-input thing, because it does seem weird to place “what to do when done parsing” in the section called “HTML parsing”. You would think that would come after the XHTML parsing section.

Stomme_poes · August 1, 2011, 6:53am

diannaa: yeah, but the question was, did that change with XHTML5 (since it’s not XHTML1.1)? I got more confirmation from Anne that all those rules of the XML parser are necessary with XHTML5… but it does seem some small things were changed in the XML spec regarding stuff like DOM events. Stuff I can’t follow since I’ve never done XML Dom scripting or anything similar.