Is Generated Content Actually Content?

The CSS2.1 specification summarizes generated content as “[rendered] content that does not come from the document tree” — in other words, text and images defined in CSS, rather than in markup.

Perhaps the most common type of generated content, is text or textual-glyphs defined with the CSS content property, and added to the page using :before or :after pseudo-elements. For example, I often like to add a right-pointing arrow after “more” links, which is implemented as a unicode symbol in generated content:

a.more:after
{
  content:"2192";
  margin-left:0.25em;
}

Or for print-media stylesheets, it’s nice to expand link hrefs into full URLs:

@media print
{
  a[href]:after
  {
    content:" (" attr(href) ")";
  }
}

But even though we refer to this as generated content, I think that’s misleading — because generated content is not content at all, it’s presentation.

The specification is not particularly explicit on this point. It does refer to it as “content” in examples and descriptions, yet many of the examples it gives are edge-case (as we’ll see later). And it is part of CSS after all, so doesn’t that imply presentation?

I’ve come to the conclusion that it shouldn’t be treated as content, by looking at different examples of how generated content is used, and more importantly, how it’s interpreted by browsers and assistive technologies. See if you agree.

An Ideal Case

A recent and ideal example of how generated content can be used, is Craig Buckler’s CSS3 Toggle Switches. Generated content is used to add a tick or cross to the switch, which provides extra visual-indication of its state:

input.switch:empty ~ label:before
{
  content:"2718";
  text-indent:2.4em;
  color:#900;
}

input.switch:checked ~ label:before
{
  content:"2714";
  text-indent:0.5em;
  color:#6f6;
}

The information this conveys is visual and supplementary, since the underlying semantics are conveyed by the checked-state of the form control. The generated content merely supplements the red and green colors, so that people who have a red-green color deficiency will be able to differentiate the states more easily.

You wouldn’t rely on color, fonts, or borders to convey important semantics, because many groups of users don’t perceive that information. And so it is for generated content.

Some Edge-Cases

The edge-cases are situations where generated content is used to supplement semantic information, that exists already but would otherwise be less clear. The specification describes two such examples in its introduction:

authors may want the user agent to insert the word ‘Figure’ before the caption of a figure, or ‘Chapter 7’ before the seventh chapter title

To take the chapters example — assuming that all the chapters are on a single page, then the user can count the headings to know which chapter number they’re reading. So the information it conveys is already available, but made more obvious by adding “Chapter 7” before the title.

I think the critical factor in cases like this, is to decide how important the additional information is. Does it really matter to the user that they’re reading the 7th chapter? Would important information be lost if the extra text were not shown?

If the information is important then it should not be defined using generated content. But if the text is purely decorative or presentational, then generated content is an appropriate choice.

So what about the numbering of lists? In many cases, numbering is arbitrary and presentational, even with an ordered-list — i.e the order of items might be important, but whether they’re numbered 1.2.3. or A.B.C. might not matter at all. In that case, generated content (or for that matter, native list numbering) is an appropriate choice.

But with some kinds of document, particularly legal documents such as contracts, the numbering of chapters and sections is integral to the meaning. A contract clause might refer to another, specifically-numbered clause, and in that case I would say that the numbering is content, not presentation, and that therefore generated content should not be used. In fact I’d go so far as to say, such documents should not even use native list numbering — the numbers should be hard-coded into the markup.

Practical Differences

Generated content is not the same as normal text, and this can be seen in several key functional differences.

Only the most recent screenreaders will speak generated content, and therefore older devices will miss any content defined this way (as will older browsers, such as IE7). Although there will be cases where you don’t want screenreaders to pronounce the text anyway — for example, with the “more” link arrow I mentioned at the start, it’s not helpful for readers to say something like “more. right-pointing arrow”. James Craig has proposed additional ARIA CSS properties, that would allow authors to control such cases; but personally, I think it would be better if screenreaders didn’t speak generated content at all.

For all browsers except Opera, generated content is not included in text-selections and clipboard data. And in all browsers, it doesn’t create text-nodes in the document, and it cannot be the target of an event. As far as the DOM is concerned, generated content doesn’t exist.

But it does exist in the presentation layer — i.e. in the CSS DOM — and there’s a simple way to get the text from generated content, by referring to the content property of the pseudo-element’s computed style:

var element = document.getElementById('whatever');
var text = window.getComputedStyle(element, ':before').content;

Broad Conclusions

For me, these practical differences underline the key conceptual difference between generated content and normal text — generated content is not content text, just as background images are not content images.

Text can still convey information without conveying semantics — just as colors and other design choices convey information, yet the document still makes sense when that information is missing. And that’s the real point — generated content might not be seen or perceived by the user at all, so if it’s important, it should be in the markup.