Is Generated Content Actually Content?

The CSS2.1 specification summarizes generated content as “[rendered] content that does not come from the document tree” — in other words, text and images defined in CSS, rather than in markup.

Perhaps the most common type of generated content, is text or textual-glyphs defined with the CSS content property, and added to the page using :before or :after pseudo-elements. For example, I often like to add a right-pointing arrow after “more” links, which is implemented as a unicode symbol in generated content:

a.more:after
{
  content:"2192";
  margin-left:0.25em;
}

Or for print-media stylesheets, it’s nice to expand link hrefs into full URLs:

@media print
{
  a[href]:after
  {
    content:" (" attr(href) ")";
  }
}

But even though we refer to this as generated content, I think that’s misleading — because generated content is not content at all, it’s presentation.

The specification is not particularly explicit on this point. It does refer to it as “content” in examples and descriptions, yet many of the examples it gives are edge-case (as we’ll see later). And it is part of CSS after all, so doesn’t that imply presentation?

I’ve come to the conclusion that it shouldn’t be treated as content, by looking at different examples of how generated content is used, and more importantly, how it’s interpreted by browsers and assistive technologies. See if you agree.

An Ideal Case

A recent and ideal example of how generated content can be used, is Craig Buckler’s CSS3 Toggle Switches. Generated content is used to add a tick or cross to the switch, which provides extra visual-indication of its state:

input.switch:empty ~ label:before
{
  content:"2718";
  text-indent:2.4em;
  color:#900;
}

input.switch:checked ~ label:before
{
  content:"2714";
  text-indent:0.5em;
  color:#6f6;
}

The information this conveys is visual and supplementary, since the underlying semantics are conveyed by the checked-state of the form control. The generated content merely supplements the red and green colors, so that people who have a red-green color deficiency will be able to differentiate the states more easily.

You wouldn’t rely on color, fonts, or borders to convey important semantics, because many groups of users don’t perceive that information. And so it is for generated content.

Some Edge-Cases

The edge-cases are situations where generated content is used to supplement semantic information, that exists already but would otherwise be less clear. The specification describes two such examples in its introduction:

authors may want the user agent to insert the word ‘Figure’ before the caption of a figure, or ‘Chapter 7’ before the seventh chapter title

To take the chapters example — assuming that all the chapters are on a single page, then the user can count the headings to know which chapter number they’re reading. So the information it conveys is already available, but made more obvious by adding “Chapter 7” before the title.

I think the critical factor in cases like this, is to decide how important the additional information is. Does it really matter to the user that they’re reading the 7th chapter? Would important information be lost if the extra text were not shown?

If the information is important then it should not be defined using generated content. But if the text is purely decorative or presentational, then generated content is an appropriate choice.

So what about the numbering of lists? In many cases, numbering is arbitrary and presentational, even with an ordered-list — i.e the order of items might be important, but whether they’re numbered 1.2.3. or A.B.C. might not matter at all. In that case, generated content (or for that matter, native list numbering) is an appropriate choice.

But with some kinds of document, particularly legal documents such as contracts, the numbering of chapters and sections is integral to the meaning. A contract clause might refer to another, specifically-numbered clause, and in that case I would say that the numbering is content, not presentation, and that therefore generated content should not be used. In fact I’d go so far as to say, such documents should not even use native list numbering — the numbers should be hard-coded into the markup.

Practical Differences

Generated content is not the same as normal text, and this can be seen in several key functional differences.

Only the most recent screenreaders will speak generated content, and therefore older devices will miss any content defined this way (as will older browsers, such as IE7). Although there will be cases where you don’t want screenreaders to pronounce the text anyway — for example, with the “more” link arrow I mentioned at the start, it’s not helpful for readers to say something like “more. right-pointing arrow”. James Craig has proposed additional ARIA CSS properties, that would allow authors to control such cases; but personally, I think it would be better if screenreaders didn’t speak generated content at all.

For all browsers except Opera, generated content is not included in text-selections and clipboard data. And in all browsers, it doesn’t create text-nodes in the document, and it cannot be the target of an event. As far as the DOM is concerned, generated content doesn’t exist.

But it does exist in the presentation layer — i.e. in the CSS DOM — and there’s a simple way to get the text from generated content, by referring to the content property of the pseudo-element’s computed style:

var element = document.getElementById('whatever');
var text = window.getComputedStyle(element, ':before').content;

Broad Conclusions

For me, these practical differences underline the key conceptual difference between generated content and normal text — generated content is not content text, just as background images are not content images.

Text can still convey information without conveying semantics — just as colors and other design choices convey information, yet the document still makes sense when that information is missing. And that’s the real point — generated content might not be seen or perceived by the user at all, so if it’s important, it should be in the markup.

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • http://theheatexchange.wordpress.com/ H.E.A.T.

    I understand your point.

    If the title of your article was, “Should Generated Content Be Used As Actual Content?”, then you hit the nail on the head.

    Generated content (I will use GC to keep this short) is actual content. Though maybe non-conforming (accessibility-wise) to use critical information in GC, if GC contains such information that a sighted person can see, then it is actual content. You make this point of practice yourself in this article.

    Say, for instance, the word “WARNING” was added as GC to the beginning of a block of text. Regardless of the right or wrong use of this GC, would the visual reader “see” that word as critical and maybe read the block of text where he or she would have otherwise bypassed?

    Suppose the word “WARNING” was mispelled as “WANING”? Would this error receive criticism by a visual reader just as much as any other content on the page?

    I understand your point on how GC “should” be used to express non-critical information, especially for accessibility purposes. Still, if GC is added to a page that contains critical information that is missing from the rest of the page, then it must be treated as actual content because this is how visual readers will regard it.

    • http://www.brothercake.com/ James Edwards

      If GC is added to a page that contains critical content, and that content is not represented in any other way, then this is an error and an inappropriate use of GC.

      To take your warning example — if the word “WARNING” is the only way in which the important nature of the following text is indicated, then this is also an error. For example, form validation errors are typically indicated with red, bold text, but that alone is not sufficient to describe them as form errors; they should also be next to the field they refer to, and there should be an “aria-invalid” attribute on the field itself. Perhaps in this case it could also have the warning “WARNING” or “ERROR” added before the error text using GC, and that would convey information, but it wouldn’t be content.

      I guess the distinction is between “information” and “content” — content is information, but information is not necessarily content. Colors and fonts convey information, but not content, because the content itself has the same meaning irrespective of its color or font.

      So by that distinction, GC is information but it’s not content.

  • http://pluslion.com Jamie Knight

    Hiya,

    This discussion recently came up in work. Specifically, around font icons and UTF PAE space characters.

    From the AT implementers side of the equation i can see a good case for reading the content. There is much poorly authored content out there, and in practice it seems its worth being more verbose than missing core content. For example, when font icons are added to the DOM, only with the content property and a empty tag. Its wrong, but people do it. I guess, its better to have content which may be verbose, than possibly miss content which carried importance.

    I would like to see some stats about how authors in the wild are using generated content. Perhaps this is something where AT’s could define a range of UTF which they wont attempt to read to the user.

    Just my 2c, interesting discussion.

    Cheers,

    Jamie + Lion

    • http://www.brothercake.com/ James Edwards

      I agree that from the AT vendor point of view, it’s better to be too verbose than to miss important information. I imagine that’s why they chose to read it and not ignore it. Ideally, web authors would have the ability to specify whether generated content is treated as text or content, but I think the most likely scenario is the CSS aria- extensions proposed by James Craig. That way, generated content will be read unless the author says not to.

      But we can do something like even now, by using empty elements with generated content, and you mention that as a solution you’ve seen people doing. So why do you think that’s wrong? People use font icons as icons, not text, and icons are not content per se. If the icon is the only way of representing the significance of something, then that’s an error, but the icon itself is just decoration.

  • http://www.mathewporter.co.uk Mathew Porter

    I concur “if it’s important, it should be in the markup”.

  • http://ww3.co.il Neil Osman

    Thanks for this fantastic read. I totally agree with James observations and conclusions. Maybe it’s time for AT providers to join the world of compliance, but first, maybe it’s time for the W3C to address AT implementations of it’s specs, alongside browsers and authors implementations .
    In my experience, contrary to James note about modern AT, i noticed some modern screen readers do read GC and some do skip reading it.