Six Months Later: The New HTML Working Group

The following is republished from the Tech Times #164.

Because I just wasn’t getting enough email (ha!), I joined the W3C’s new HTML Working Group last month.

Nearly six months ago, now, Tim Berners-Lee announced that the W3C would form a new working group to develop the next version of the HTML specification, alongside renewed efforts towards finalizing XHTML 2.0.

With the new working group now well underway, this blog post will look at what progress has been made, what issues have arisen, and just what we should expect from the next version of HTML.

The New Working Group

Unlike most W3C working groups, the new HTML working group’s charter welcomes the scrutiny and participation of the general public. Anyone can join the working group, post to the mailing list, chime in on teleconferences, and vote on what goes into the final spec.

Better yet, there is no minimum level of participation, so if all you have time to do is monitor the mailing list and vote on issues of interest to you, that’s fine too.

The working group is currently led with good humor by its co-chairs, Chris Wilson (Microsoft/Internet Explorer) and the Dan Connolly (W3C), who do their best to squeeze consensus out of the roughly 1,000 email messages per week that are posted by the working group’s membership.

Also participating in the group are representatives from the Mozilla Foundation, Apple’s Safari team, Opera’s browser team, and familiar faces from the Web Hypertext Application Technology (WHAT) Working Group, which had undertaken the task of updating HTML on its own before the new W3C working group was created.

Officially, the group’s first target is to produce a working draft of some description (even if it’s just a roadmap for further development) by the end of June. This is to be the first step on the way to a finished specification by the end of the year 2010. To make this happen, the group needed one or more editors to compile and maintain the documents produced by the group.

A Surprise Proposal

Before the search could begin, however, representatives of Mozilla, Apple, and Opera came forward with a proposal to adopt the WHAT Working Group’s HTML5 draft specification as a starting point for further development of HTML within the W3C.

After no small amount of discussion, the W3C’s HTML WG today voted to accept the proposal, with these specific outcomes:

  • The WHAT Working Group’s HTML5 (Web Applications 1.0 and Web Forms 2.0) will become the current working draft, and an extensive review by the new working group will now take place.
  • The final W3C specification will be named “HTML 5″.
  • The W3C specification will be edited by Ian Hickson (Google), editor of the WHAT-WG’s HTML5, and David Hyatt (Apple/Safari).

And there we have it: the harmful division that had come to exist between the major browser vendors and the W3C seems to be a thing of the past! So far so good, right?

Of course, there are still big challenges ahead, not least of which will be getting this large and open working group to agree on a seemingly endless list of technical minutiae.

HTML 5 Issues on the Table

As you would expect from a public group, open to anyone with a passion for web standards, the W3C’s new HTML Working Group is a noisy place. Despite the relative ease with which the WHAT Working Group’s HTML5 was adopted, there are already a number of divisive issues on the table.

As respected web standards blogger Roger Johansson wrote earlier this week, proposals are routinely posted on the Working Group’s mailing list that would make a standards-aware developer break out in a cold sweat. One recent example was a very serious proposal for an <indent> tag, with the argument that you shouldn’t have to learn CSS to indent something in HTML.

Here are just a few of the issues currently being discussed by the Working Group…

Breaking the Web

I would expect that most readers of this blog would agree that Internet Explorer 7 is a distinct improvement over IE6, and that the minor changes required to make our sites compatible with the new browser were a small price to pay for that improvement. Chris Wilson’s boss at Microsoft doesn’t necessarily agree.

In a lengthy post to the working group, Wilson explained the realities of developing a browser that hundreds of millions of users rely on:

The reality of it is, when a major browser is released, that is a point singularity on the web; it has the ability to, but cannot be allowed to, cause widespread disruption. [...] IE7 did cause widespread disruption, as a case in point. I championed making those widespread changes to improve our standards compliance. In all seriousness, I’ve managed to hang on to my job, but sometimes I think only just. I cannot go to my team and say “hey, we’re gonna break the web again (and again and again), but it’s okay because it’s for a good cause.” The world doesn’t work that way. I wouldn’t be responsibly doing my job—that one where half a billion web users rely on my team to not hose compatibility with their banking web site, even if their bank doesn’t know how to properly use CSS ‘float’.”

Consequently, he went on to say, Internet Explorer 8 will be making very few changes, if any, to the default rendering of HTML content. In order to take advantage of new features and standards compliance fixes, developers will have to include in their code a “switch” that tells the browser to use its new, non-backwards-compatible rendering mode.

This was done once before, in Internet Explorer 5.5, which switched into “standards compliance mode” when it saw a newer DOCTYPE declaration at the top of an HTML file. As Microsoft does not expect to add support for all of HTML 5 in a single release, however, the new switch in IE8 promises to be more granular.

Is this type of switch something that should be defined in the HTML specification? Debate in the working group rages on.

The Fate of Presentational Elements

<b> and <i> seem to be the presentational tags that just won’t die. While their cohorts <tt>, <u>, and others were deprecated in HTML 4 in favor semantically meaningful tags like <strong> and <em>, <b> and <i> remain tags “in good standing,” up to and including the current HTML 5 draft.

The case for keeping these tags, according to several WHAT Working Group members now participating in the W3C Working Group, is that if they were to be removed, naïve content authors would start to misuse semantically meaningful tags. For example, someone looking for “the italics tag”, upon finding that <i> was no longer allowed in HTML 5, might just start using <em> to generically apply italics to text, which would damage the relatively strong semantic meaning that this tag currently enjoys.

This same logic led to the re-inclusion of the <font> tag in the WHAT Working Group’s HTML 5 draft spec. Yes, really! Of course, the tag’s description makes it clear that using it is usually a bad idea:

The font element doesn’t represent anything. It must not be used except by WYSIWYG editors, which may use it to achieve presentational affects. Even WYSIWYG editors, however, should make every effort to use appropriate semantic markup and avoid the use of media-specific presentational markup.

At this stage I would expect that, if they do remain in the final HTML 5 spec, all presentational elements will be described in similar terms.

How To Treat Broken/Deprecated Markup

Something most web developers don’t realize is the W3C’s HTML 4 specification didn’t do a very good job of describing what browsers should do when they encounter broken code. Here’s the canonical example:

<strong>This line <em>contains</strong> some words</em>

In XHTML, this type of overlapping tag structure is flat-out illegal, and browsers that currently support XHTML will simply display an error message rather than attempt to make sense of this code.

HTML, meanwhile, takes a softer stance, merely stating that the interpretation of structures like that above is up to the individual browser.

In the real world, vagaries like this have meant that every new browser on the market has had to figure out the behavior expected of them by laboriously reverse-engineering dominant competitors like Internet Explorer.

In addition to extending and enhancing the capabilities of HTML, the HTML 5 spec also aims to fill in these blanks so that, for the first time, there will exist a complete specification for the interpretation of today’s web content.

In this sense, the HTML 5 spec is serving two masters—browser vendors that want a complete description of the idiosyncrasies of today’s browsers, and developers that want a better language for building the web sites of the future.

Some would argue that the best way to build a solid foundation for the future of the Web is to leave behind the messiness of HTML and build a new markup language from scratch. This is exactly what the XHTML2 Working Group is doing, but with much less buy-in from browser vendors, many believe its work will be without practical value for many years to come.

Extending HTML Semantics

Currently, the class attribute is commonly used to extend the range of semantic meaning that may be represented in an HTML document. HTML has no tag for the title of a book, for example, but you could use <span class="booktitle"> in all of your documents and style them appropriately with CSS.

In a bid to standardize a small number of common class names so that they may be used by browsers and assistive technologies to infer greater meaning from HTML documents, the WHAT Working Group proposed the following predefined classes: copyright, error, example, issue, note, search, and warning.

Naturally, many of these class names are already in use on the Web today. The WHAT Working Group’s hope was that, where they are used, their use will be overwhelmingly in agreement with the meaning defined in the HTML 5 spec.

Not everyone in the new HTML Working Group agrees. Some argue that the new spec should not suddenly define reserved values for an attribute that was previously open for arbitrary use by content authors. Proposed alternatives include prefixing predefined class names in the same way that was once done for the target attribute (e.g. class="_copyright"), or adopting a variation of XHTML 2.0’s role attribute to sit alongside class.

As you can see, building the next version of the “simple” markup language that powers the Web is no simple matter. But with the HTML Working Group open to all, it has never been easier to put your thoughts in front of the people that matter.

If any of the issues I discussed above are important to you, I’d encourage you to join up and have your say!

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • jeff

    I will only join if I can take complete control of the project and all those involved answer to me and only me. After a few weeks of power I will seize to contibute anything and will take the roll of ‘Spiritual leader’.

  • Pingback: RyanPrice.ca - myWork » Blog Archive » Why You Should Know About Proper HTML Markup

  • malikyte

    Am I the only one who doesn’t really want more markup tags? As it is, rarely do even some seasoned developers even know of the uses for the following (somewhat) useful tags: DFN, SAMP, KBD, VAR, CITE.

    I fail to understand what adding a multitude of more tags will do to help any situation. I’m sure the argument that learning CSS text styling would be easier (and more beneficial) than learning new HTML tags, so I won’t even comment on that one (more than I just have). ;)

  • singer

    I don’t agree. Who wants more mark-up data?

  • Adam A Flynn

    Great post, Kevin. I’m nowhere near brave enough to even try to glance over the thousand e-mails per week that make up the HTML5 WG mailing list, but I hope that you keep us posted on major developments as the project unfolds.

  • Pingback: Ajaxian » HTML 5: Positive Momentum

  • Pingback: HTML 5 (formally known as Web Applications 1.0/Web Forms 2.0) » Code Candies

  • Etnu

    The idea of adding non-semantic tags is absurd. If people want to use a word processor, they should use a word processor. HTML was never intended to be a storage medium for word processors. We’ve had standards for that sort of document for years — we don’t need to muck up HTML just because some people think everything and anything should be done in HTML.

    Ultimately, this kind of sloppiness is just going to lead towards people abandoning html in favor of WPF/E, XUL, or Flash (flex), all of which are quickly becoming more appealing due to the stagnation of html over the last 8 years.

    Honestly, who benefits from re-introducing the font tag and adding stuff like indent? “Ordinary users”? Come on, “ordinary” users don’t know HTML as is, and the only people who “know html” who don’t know CSS are the people who occasionally put bold tags in their message board posts or the occasional sysadmin who wants to build web pages to show some system stats. That pretty much leaves three groups:

    – People trying to implement word processors on the internet more easily
    – People who will never be bothered to learn CSS in the first place, and probably don’t actually use html on a day to day basis in the first place.
    – People who fear the damage that high-quality web languages will do to their proprietary software businesses (microsoft, adobe, and apple are especially suspect here), and therefore want to make the standards worse.

    HTML is already more than sufficient as a general-purpose document markup language. It could remain unchanged and nobody would complain that it wasn’t sufficient for this purpose.

    What the web really needs right now can not (and should not) be solved by HTML. The biggest holes in the ecosystem are with javascript and CSS — not with html.

    Specifically:

    – Javascript needs many improvements, and not just the ones that make it look like Java (EcmaScript 4). Packages, socket libraries, standardization, offline support, local storage, e4x, etc. Get these things and then we’ll talk.

    – CSS support in most browsers is far behind. There is only one modern browser that can render rounded corners without using images decently (Safari). Many layout tools that have already had specs defined aren’t supported. This needs to be fixed, and we need more tools available to us.

    What good is a new HTML spec going to actually do for the web?

  • Pingback: HTML 5: Positive Momentum

  • xxdesmus

    I completely agree that is is sad (and borderline stupid) to include semantic markup such as <b> and <i> just because of less intelligent people who might not use the alternative markup correctly.

    Since when do we design/create for the least intelligent? How can we possibly expect to move forward and create a efficient and useful HTML 5 spec if we continue with this backwards kind of thinking? If someone users the <em> incorrectly then that is too bad, but do not punish all of us by including semantic markup just to suite these fools.

    For a lack of a better example, a while back senators were trying to push through a law that would ban all social networking sites from public library (federally funded) computers. Their logic was that online predators might attack children using these social networking sites (via these computers). Since when do we give up individual personal freedom just so that bad people won’t do bad things? That is not the solution, the solution is to stop the bad people without interfering with the good people’s rights. (I told you this wasn’t a great example) In this case, we’re giving up the the “right” to a cleaner, more efficient, more semantic language just to “protect against” the people who will use it incorrectly. It is a stretch, but maybe I made my point (somehow I doubt that).

  • h3h

    Good writeup and fair analysis. I’m hoping many of the issues raised in the comments will be addressed by the end user tutorials that will give people friendly and manageable insights into the use of HTML 5, rather than just saying “go read the spec.”

    We’ll see.

  • Kelly

    I like the role attribute idea versus the predefined class names.

    I also don’t like the predefined class names being used by the emerging Microformat culture.

    We need a clear distinction in what is being “semantically marked up” from what is being “styled”. The class attribute should not be used for both purposes, And since it is already being used to describe presentation the Microformat and Semantic folks need a new attribute. But hey… that’s just my opinion.
    -Kelly

    Kelly

  • Pingback: Der Kunde und sein Word-Heimatgefühl - sprungmarker

  • calinder

    maybe they should just make it simple for us and call it “reserved” or “res” so… res=”copyright”

    I guess role is pretty straightforward though too….

    the simpler the better for me….

  • Pingback: More on Rich Internet Applications » Miscellaneous and Useless Information

  • Xavier Badosa

    I agree with Kelly: do not use class for semantical purposes (that’s for the microformat folks, too). BTW, why should we be forced to use a class name in English (like “warning”) when all our style names are, for example, in Swahili? Please, no special rules for class names (like reserved words or “_”)!

    On the other hand…

    The cost for seeking universality in HTML was poverty: let’s face it, as a semantical language, HTML is really poor, which forces us to abuse the small set of tags (Example: dl). Adding new elements for very common and complex structural units, it’s not a sin (rel won’t do the job) but a necessity (less divitis, accessibility, etc.) and will ensure backward-compatibility of the existing sites.

    On a different issue…

    Yes, the HTML specification should define a standard switch to choose the rendering mode of the browser (BTW, browsers could standardize the info on the user-agent field, including data about the device like handheld, thank you).

    Very informative article! Thank you for saving us from the 1,000 email messages per week!

  • Pingback: SourceLOG » Blog Archive » HTML 5: The Beginning

  • Chris

    Personally, I’m in favor of NOT adding more tags to HTML.

    Why?

    Very simply because HTML should be a simplistic markup language. Kind of like a stepping stone to using things like CSS.

    In all honesty, I think it would be in the best interest of the web page creation software products to create markup that is simple to choose.
    Something like a web SITE setting that would generate a web page/site for things like “Ensure compatibility with browser XYZ”.

    One of the biggest headaches I’ve ever found was using Microsoft Word to save an HTML file, only to find a TON of additional markup for areas that are duplicated. After all, how many times have you seen a reference to “MsoNormal” in an HTML file that was generated by most of the MS products?
    Can’t the products be smart enough to look for similarities, and generate HTML/CSS markup so that whole sections aren’t duplicated (by default of course)?

  • wheeler

    Dare I say, some of these proposals are opening up web design to the uneducated. Why would we possibly want to stagnate standards when everything else on the web is speeding ahead.

    its pretty frustrating dedicated years to learning a craft, only to have it “dumbed down” on account of one horrible browser, and the dwindling legions of web professionals that still use it.

    @Chris – have you never used Adobe Dreamweaver? I couldn’t think of anything worse than generating HTML in Word.

  • Pingback: PlusNet Web Portal Team - Blog Archive » Tasty Morsels #1

  • Pingback: Web Dev Newspaper » HTML5: time to think!

  • Pingback: Continuing Intermittent Incoherency » +1.5 Years: Where Are We Now?

  • Pingback: sprungmarker » Blog Archive » Der Kunde und sein Word-Heimatgefühl

  • Pingback: Tasty Morsels #1 | Community Site News

  • Sue

    WOW! This is new to me as I’ve been out of the loop. I will surely keep an eye on this issue and would like to see what W3C can come up with. I am curious of one issue though. Since XHTML needs a CSS for the styles to show, would the new coding not need it?

    A side thought regarding programs, Dreamweaver is one of the easiest ways to create a markup language rather than using Word. Dreamweaver is straight on with code and it does not add excess coding such as Word has done when creating an HTML document.

  • Pingback: Six Months Later: The New HTML Working Group | sitepoint.com

  • Pingback: Recent Links Tagged With "whatwg" - JabberTags