Six Months Later: The New HTML Working Group

The following is republished from the Tech Times #164.

Because I just wasn’t getting enough email (ha!), I joined the W3C’s new HTML Working Group last month.

Nearly six months ago, now, Tim Berners-Lee announced that the W3C would form a new working group to develop the next version of the HTML specification, alongside renewed efforts towards finalizing XHTML 2.0.

With the new working group now well underway, this blog post will look at what progress has been made, what issues have arisen, and just what we should expect from the next version of HTML.

The New Working Group

Unlike most W3C working groups, the new HTML working group’s charter welcomes the scrutiny and participation of the general public. Anyone can join the working group, post to the mailing list, chime in on teleconferences, and vote on what goes into the final spec.

Better yet, there is no minimum level of participation, so if all you have time to do is monitor the mailing list and vote on issues of interest to you, that’s fine too.

The working group is currently led with good humor by its co-chairs, Chris Wilson (Microsoft/Internet Explorer) and the Dan Connolly (W3C), who do their best to squeeze consensus out of the roughly 1,000 email messages per week that are posted by the working group’s membership.

Also participating in the group are representatives from the Mozilla Foundation, Apple’s Safari team, Opera’s browser team, and familiar faces from the Web Hypertext Application Technology (WHAT) Working Group, which had undertaken the task of updating HTML on its own before the new W3C working group was created.

Officially, the group’s first target is to produce a working draft of some description (even if it’s just a roadmap for further development) by the end of June. This is to be the first step on the way to a finished specification by the end of the year 2010. To make this happen, the group needed one or more editors to compile and maintain the documents produced by the group.

A Surprise Proposal

Before the search could begin, however, representatives of Mozilla, Apple, and Opera came forward with a proposal to adopt the WHAT Working Group’s HTML5 draft specification as a starting point for further development of HTML within the W3C.

After no small amount of discussion, the W3C’s HTML WG today voted to accept the proposal, with these specific outcomes:

The WHAT Working Group’s HTML5 (Web Applications 1.0 and Web Forms 2.0) will become the current working draft, and an extensive review by the new working group will now take place.
The final W3C specification will be named “HTML 5”.
The W3C specification will be edited by Ian Hickson (Google), editor of the WHAT-WG’s HTML5, and David Hyatt (Apple/Safari).

And there we have it: the harmful division that had come to exist between the major browser vendors and the W3C seems to be a thing of the past! So far so good, right?

Of course, there are still big challenges ahead, not least of which will be getting this large and open working group to agree on a seemingly endless list of technical minutiae.

HTML 5 Issues on the Table

As you would expect from a public group, open to anyone with a passion for web standards, the W3C’s new HTML Working Group is a noisy place. Despite the relative ease with which the WHAT Working Group’s HTML5 was adopted, there are already a number of divisive issues on the table.

As respected web standards blogger Roger Johansson wrote earlier this week, proposals are routinely posted on the Working Group’s mailing list that would make a standards-aware developer break out in a cold sweat. One recent example was a very serious proposal for an <indent> tag, with the argument that you shouldn’t have to learn CSS to indent something in HTML.

Here are just a few of the issues currently being discussed by the Working Group…

Breaking the Web

I would expect that most readers of this blog would agree that Internet Explorer 7 is a distinct improvement over IE6, and that the minor changes required to make our sites compatible with the new browser were a small price to pay for that improvement. Chris Wilson’s boss at Microsoft doesn’t necessarily agree.

In a lengthy post to the working group, Wilson explained the realities of developing a browser that hundreds of millions of users rely on:

The reality of it is, when a major browser is released, that is a point singularity on the web; it has the ability to, but cannot be allowed to, cause widespread disruption. […] IE7 did cause widespread disruption, as a case in point. I championed making those widespread changes to improve our standards compliance. In all seriousness, I’ve managed to hang on to my job, but sometimes I think only just. I cannot go to my team and say “hey, we’re gonna break the web again (and again and again), but it’s okay because it’s for a good cause.” The world doesn’t work that way. I wouldn’t be responsibly doing my job—that one where half a billion web users rely on my team to not hose compatibility with their banking web site, even if their bank doesn’t know how to properly use CSS ‘float’.”

Consequently, he went on to say, Internet Explorer 8 will be making very few changes, if any, to the default rendering of HTML content. In order to take advantage of new features and standards compliance fixes, developers will have to include in their code a “switch” that tells the browser to use its new, non-backwards-compatible rendering mode.

This was done once before, in Internet Explorer 5.5, which switched into “standards compliance mode” when it saw a newer DOCTYPE declaration at the top of an HTML file. As Microsoft does not expect to add support for all of HTML 5 in a single release, however, the new switch in IE8 promises to be more granular.

Is this type of switch something that should be defined in the HTML specification? Debate in the working group rages on.

The Fate of Presentational Elements
 and  seem to be the presentational tags that just won’t die. While their cohorts <tt>, , and others were deprecated in HTML 4 in favor semantically meaningful tags like  and ,  and  remain tags “in good standing,” up to and including the current HTML 5 draft.

The case for keeping these tags, according to several WHAT Working Group members now participating in the W3C Working Group, is that if they were to be removed, naïve content authors would start to misuse semantically meaningful tags. For example, someone looking for “the italics tag”, upon finding that  was no longer allowed in HTML 5, might just start using  to generically apply italics to text, which would damage the relatively strong semantic meaning that this tag currently enjoys.

This same logic led to the re-inclusion of the  tag in the WHAT Working Group’s HTML 5 draft spec. Yes, really! Of course, the tag’s description makes it clear that using it is usually a bad idea:

The font element doesn’t represent anything. It must not be used except by WYSIWYG editors, which may use it to achieve presentational affects. Even WYSIWYG editors, however, should make every effort to use appropriate semantic markup and avoid the use of media-specific presentational markup.

At this stage I would expect that, if they do remain in the final HTML 5 spec, all presentational elements will be described in similar terms.

How To Treat Broken/Deprecated Markup

Something most web developers don’t realize is the W3C’s HTML 4 specification didn’t do a very good job of describing what browsers should do when they encounter broken code. Here’s the canonical example:

<strong>This line <em>contains</strong> some words</em>

In XHTML, this type of overlapping tag structure is flat-out illegal, and browsers that currently support XHTML will simply display an error message rather than attempt to make sense of this code.

HTML, meanwhile, takes a softer stance, merely stating that the interpretation of structures like that above is up to the individual browser.

In the real world, vagaries like this have meant that every new browser on the market has had to figure out the behavior expected of them by laboriously reverse-engineering dominant competitors like Internet Explorer.

In addition to extending and enhancing the capabilities of HTML, the HTML 5 spec also aims to fill in these blanks so that, for the first time, there will exist a complete specification for the interpretation of today’s web content.

In this sense, the HTML 5 spec is serving two masters—browser vendors that want a complete description of the idiosyncrasies of today’s browsers, and developers that want a better language for building the web sites of the future.

Some would argue that the best way to build a solid foundation for the future of the Web is to leave behind the messiness of HTML and build a new markup language from scratch. This is exactly what the XHTML2 Working Group is doing, but with much less buy-in from browser vendors, many believe its work will be without practical value for many years to come.

Extending HTML Semantics

Currently, the class attribute is commonly used to extend the range of semantic meaning that may be represented in an HTML document. HTML has no tag for the title of a book, for example, but you could use  in all of your documents and style them appropriately with CSS.

In a bid to standardize a small number of common class names so that they may be used by browsers and assistive technologies to infer greater meaning from HTML documents, the WHAT Working Group proposed the following predefined classes: copyright, error, example, issue, note, search, and warning.

Naturally, many of these class names are already in use on the Web today. The WHAT Working Group’s hope was that, where they are used, their use will be overwhelmingly in agreement with the meaning defined in the HTML 5 spec.

Not everyone in the new HTML Working Group agrees. Some argue that the new spec should not suddenly define reserved values for an attribute that was previously open for arbitrary use by content authors. Proposed alternatives include prefixing predefined class names in the same way that was once done for the target attribute (e.g. class="_copyright"), or adopting a variation of XHTML 2.0’s role attribute to sit alongside class.

As you can see, building the next version of the “simple” markup language that powers the Web is no simple matter. But with the HTML Working Group open to all, it has never been easier to put your thoughts in front of the people that matter.

If any of the issues I discussed above are important to you, I’d encourage you to join up and have your say!