The Tragic Comedy that is Rich Text Editing on the Web

    Andrew Tetlaw
    Andrew Tetlaw

    Rich Text Editor

    The problem is this: clients want a word processor, but developers want clean semantic HTML. Web page rich text editors were supposed to be the answer. As far back as IE4 Microsoft have offered a rich text editing component; Mozilla also implemented a similar editor and other browsers have now done so too. These built-in editing components have no user interface and can only be accessed by JavaScript; hence the crop of JavaScript rich text editors that appeared after the Mozilla implementation.

    So, we’ve had rich text editing in browsers for more than ten years now. Problem solved? We’re not even close. Have you seen the HTML that is output from these editors? Let’s start with a really simple task: making text bold. In IE and Opera you’ll generate this HTML:

    <STRONG>some text</STRONG>

    In Mozilla, Safari, and Chrome, this is the result:

    <b>some text</b>

    In Mozilla, Opera, Safari, and Chrome there’s a scriptable property called styleWithCSS. While it makes no difference in Opera, if this is enabled then Mozilla will generate:

    <span style="font-weight: bold;">some text</span>

    Safari and Chrome will generate:

    <span class="Apple-style-span" style="font-weight: bold;">some text</span>

    But, it gets even worse. If you make the text bold in Mozilla, Safari, or Chrome with styleWithCSS enabled, IE and Opera will be unable to remove the bold. If you make the text bold using IE or Opera, and then try removing the bold in Firefox, incredibly the <STRONG> tag will become <strong style="font-weight: normal;">.

    There are so many examples of horrendous HTML it’s hard to know what to pull out as an example. Visit the rich text tests on Browserscope and test a bunch of browsers; it’s very enlightening. You’ll see the use of the font tag, a mixture of uppercase and lowercase letters within the same tag, some attributes quoted and some not, <br><br> used for paragraph breaks, and the use of blockquotes for indenting. You’ll find that browser implementations are wildly different. There’s even a set of circumstances that can lead to this piece of unspeakable horror:

    <SPAN style="BACKGROUND-COLOR: rgb(255,0,0)" class=Apple-style-span><FONT style="BACKGROUND-COLOR: #0000ff">foo bar baz</FONT></SPAN>

    To overcome this incompatibility, modern JavaScript rich text editors, such as TinyMCE, apply the brute force approach. They write boatloads of JavaScript simply to take the garbage HTML produced by the browsers and manipulate it to a state resembling sanity. That they do it so well is laudable, but it may also be why browser makers have little incentive to clean up their own editors.

    Being in this state in 2010 is appalling. What’s HTML5 going to do about it? The following phrase is repeated throughout the section on contentEditable:

    “The exact behavior is UA-dependent, but user agents must not, in response to a request to wrap semantics around some text, or to insert or remove a semantic element, generate a DOM that is less conformant than the DOM prior to the request.”

    While it’s good that the standard emphasizes that user agents should first do no harm, it’s still quite vague and the implementation details remain UA-dependent. I’d settle for a little consistency between browsers. There’s a case for adding a “rich text field” to the list of standard form controls and user agents could display a default toolbar for standard editing tasks.

    What do you think? Should the rich text field become part of the standard, or should we rely on JavaScript to sort it out? Is a rich text editor even the right approach for editing content on the Web?