Long Term Web Semantics: thoughts by Alex Russell

An interesting attempt at explanation of what “semantic” means, especially “semantic HTML”. http://infrequently.org/2013/11/long-term-web-semantics/

An excerpt, taken entirely out of context

Consider tables.

You’ve likely been told all of your professional career that using <table> for things that aren’t tabular data is EVIL (or at least “wrong”). Yet you also observe that many of the world’s computers have not caught fire due to the misapplication of <table>. Or <li>. Or <dt>/<dd>.

On the one hand, yes, tables do help you visually format things in a tabular way. But the history of the web’s layout-system hi-jinx led to a situation where the only compatible way to get full constraint programming power was to use tables for non-tabular data…as a layout container. This had some practical problems: some browsers were bad/slow at it. Incremental layout wasn’t “a thing” yet. There were bugs. All of this served to stretch the canonical use of <table> well beyond tabular data. Without a common meaning through use, the UI value was diluted. Whatever lingering hope HTML purists have harbored for a world in which putting data in a <table> element suddenly makes it machine extractable is certainly far from today’s state of the art; for 3 reasons:

  • Having tabular data in a table doesn’t make it useful to anyone else. The <a> element at least gives users some way to use the relationship between the UI and the things it references beyond “pure presentation”.
  • Nothing else in HTML knows how to consume that data. Even if you put all of your tabular data in a <table>, it doesn’t make any other bit of your UI more powerful.
  • People lie

Forget machine extraction; <table> isn’t a semantics failure because “people used it wrong”, it never turned into a “semantic” thing because it never evolved far enough to have meaningful relationships with others; either users or other bits of the system.

Interesting article but I lost the plot somewhere in the middle :slight_smile:

People are still arguing about this? :wink:

Great. Because then I don’t feel alone

I think his points are:

  • “semantics” in HTML can only mean something if the communicators involved (let’s say authors, human readers, screen-scrapers/readers/bots, etc) all agree on the meaning so that a conversation is possible.

and so in other words, HTML is not “semantic” because a spec or something somewhere says “this means that”. Implementation is a bigger determinate, even if specs should show how to implement. Russell claims it’s better to let the users of a language evolve its meaning, because this is what happens practically anyway: XHTML tried to define a lot more rules, and through custom DTD’s the idea of “let’s create whole new tags to say new things”, which is inventing a language from nearly the ground up, and this, in Russell’s opinion, doesn’t work/is not what happens.

  • What HTML means is changing, because meanings change over time. Either because authors start to use an element to represent a different thing than it used to, or because some software might start using an element as a hook for something and that use gets popular or something. Wordpress, for a time, encouraged the idea that “title attributes are SEO”, by automatically inserting titles every which way.

  • It seems his examples of the a and table elements is supposed to show an example of fairly-unambiguous-meaning (the a) versus non-developed-meaning (the table). I dunno if I fully agree with this: since however long ago, the a element has also been regarded as more “the element people click on”, and so are used in place of button tags to initialise scripts. If it’s href=“#”, it’s usually not actually a link. It’s a clickable something that hopefully does something, though it might not.
    And tables, while indeed being abused enough for layout that, for every data table you find, you’ll probably find a matching non-data table too (it’s a tag you can’t trust unambiguously to mean “this is a table. it’s probably full of boring figures and payrolls and stuff”), someone mentioned in the comments that tables can transfer data still in tabular form to other software for example. This suggests a table-speaking program could speak Table and recognised HTML’s as such.

There is Yet Another Thread On How To Mark Up Breadcrumbs over at the W3C: http://lists.w3.org/Archives/Public/public-html/2013Nov/0003.html