Markup Musings #1: How should you mark up dialog?

Semantic markup. Almost every developer who understands the concept, agrees with it, so why are so many people (myself included) often having trouble applying the principles?

While no doubt sometimes it’s garden-variety laziness, I know that personally there have been times that I’ve wanted to make the right decision but ended up scratching my head. While the HTML standards have done a fairly decent job of mimicking the basic forms and structures we all understand from traditional books (i.e. pages, paragraphs, headings, tables and lists), more and more often I’m finding myself faced with marking up a document that doesn’t fit so neatly into those structures — for instance restaurant menus, screenplays and comic strips. Each has a well-established format that doesn’t necessarily transfer seamlessly to the web.

While I can’t say I have the definitive answers to all these questions, I think it’s useful to at least throw the question out there and get a few alternative views — myself included.

Marking up dialog is the first conundrum we’ll look at.

Let’s start with a piece from a classic — what is the most sensible way to go about marking up the final page from the screenplay from ‘Casablanca‘ for the web?

At a glance it seems to be more than a standard series of paragraphs (<p>). It has units in a very specific order, but it’s certainly not an ordered list (<ol>). Structurally it looks very much like a definition list (<dl>) — a list of items, each with an attached block of text. How do quote (<q>) and blockquotes (<blockquote>) fit into it all?

Luckily, in this case we’re not the first to ask this very question and there has already been a significant discussion on dialog mark up in the Accessify Forums which we can draw on. Brothercake, Andrew K, Kev and I have tossed around the options and this is where we ended up.

The simplest approach would no doubt to be mark up the whole thing as simple series of paragraphs with a few spans tossed in to indicate the speaker. While this might not be indefensible, it isn’t embedding much useful information in the document either.

Another approach suggested by some has been the use of a definition list (<dl>). However, while definition lists are a very close visual fit, they are a very clear semantic no-no in this case. Arguing that each speaker is a definition type (<dt>) and that each attached passage of text is a ‘definition description’ (<dd>) is hard to sustain.

Interestingly, there is currently a proposal for a <dialog> tag in HTML5 to be based on the current structure of <dl>. I’m not really convinced that’s the right approach either.

Since in theory each block of dialog is quoted speech, <blockquote>s seem to be a meaningful structure to sink each dialog snippet into. Additionally blockquotes also allows us to embed both block-level and inline elements within them, giving us a flexible base to work with. In fact, the W3C recommends that text within a blockquote should always be contained with a block element.

So, we start with something like:


<blockquote>
<p>Louie, I think this is the beginning of a beautiful friendship</p>
</blockquote>

As there will always be a limited number of speakers, I think it makes some sense to create a class for each speaker, and attach that class to each blockquote. We’re not forced to make use of it, but it gives us the ‘hooks’ to do more useful things at a later date — for instance, highlighting, emphasizing, hiding or coloring the dialog of particular characters.


<blockquote class="rick">
<p>Louie, I think this is the beginning of a beautiful friendship</p>
</blockquote>

According to W3.org the <cite> tag ‘contains a citation or a reference to other sources‘. In our case, we’re attributing each passage to a speaker, so it makes sense to use the <cite> tag to indicate who is speaking. For clarity I’ve also given it a ‘speaker’ class, but this would be very much optional.

So, now we have something like:


<blockquote class="rick">
<cite class="speaker">Rick</cite>
<p>Louie, I think this is the beginning of a beautiful friendship</p>
</blockquote>

Finally, stage directions are another element common to almost all scripts. As these are written in a more traditional, descriptive format, they are perfectly suited to plain old paragraph. If you expected to have other kinds of paragraphs in the same document — for instance, credits, a preface or other notes — you might chose to identify your stage directions with a class specific to them. If not, plain paragraph tags would suffice.


<blockquote class="rick">
<cite class="speaker">Rick</cite>
<p>Louie, I think this is the beginning of a beautiful friendship</p>
</blockquote>

<p  class="directions">The two walk off together into the night.</p>
<p  class="directions">FADE OUT.</p>
<p  class="directions">THE END</p>

Here’s a visual representation of that same structural mark-up super-imposed over the original page.

A Visual representation of the markup super-imposed over the original Casablanca script

So, what’s your verdict?

Can you see a better approach?

Are there other document format that you’ve had trouble translating to the web?

P.S. Remember to ‘escape’ your code if you’re posting HTML snippets (i.e < becomes <)