By Andrew Tetlaw

The Tragic Comedy that is Rich Text Editing on the Web

By Andrew Tetlaw

Rich Text Editor

The problem is this: clients want a word processor, but developers want clean semantic HTML. Web page rich text editors were supposed to be the answer. As far back as IE4 Microsoft have offered a rich text editing component; Mozilla also implemented a similar editor and other browsers have now done so too. These built-in editing components have no user interface and can only be accessed by JavaScript; hence the crop of JavaScript rich text editors that appeared after the Mozilla implementation.

So, we’ve had rich text editing in browsers for more than ten years now. Problem solved? We’re not even close. Have you seen the HTML that is output from these editors? Let’s start with a really simple task: making text bold. In IE and Opera you’ll generate this HTML:

<STRONG>some text</STRONG>

In Mozilla, Safari, and Chrome, this is the result:

<b>some text</b>

In Mozilla, Opera, Safari, and Chrome there’s a scriptable property called styleWithCSS. While it makes no difference in Opera, if this is enabled then Mozilla will generate:

<span style="font-weight: bold;">some text</span>

Safari and Chrome will generate:

<span class="Apple-style-span" style="font-weight: bold;">some text</span>

But, it gets even worse. If you make the text bold in Mozilla, Safari, or Chrome with styleWithCSS enabled, IE and Opera will be unable to remove the bold. If you make the text bold using IE or Opera, and then try removing the bold in Firefox, incredibly the <STRONG> tag will become <strong style="font-weight: normal;">.

There are so many examples of horrendous HTML it’s hard to know what to pull out as an example. Visit the rich text tests on Browserscope and test a bunch of browsers; it’s very enlightening. You’ll see the use of the font tag, a mixture of uppercase and lowercase letters within the same tag, some attributes quoted and some not, <br><br> used for paragraph breaks, and the use of blockquotes for indenting. You’ll find that browser implementations are wildly different. There’s even a set of circumstances that can lead to this piece of unspeakable horror:

<SPAN style="BACKGROUND-COLOR: rgb(255,0,0)" class=Apple-style-span><FONT style="BACKGROUND-COLOR: #0000ff">foo bar baz</FONT></SPAN>

To overcome this incompatibility, modern JavaScript rich text editors, such as TinyMCE, apply the brute force approach. They write boatloads of JavaScript simply to take the garbage HTML produced by the browsers and manipulate it to a state resembling sanity. That they do it so well is laudable, but it may also be why browser makers have little incentive to clean up their own editors.

Being in this state in 2010 is appalling. What’s HTML5 going to do about it? The following phrase is repeated throughout the section on contentEditable:

“The exact behavior is UA-dependent, but user agents must not, in response to a request to wrap semantics around some text, or to insert or remove a semantic element, generate a DOM that is less conformant than the DOM prior to the request.”

While it’s good that the standard emphasizes that user agents should first do no harm, it’s still quite vague and the implementation details remain UA-dependent. I’d settle for a little consistency between browsers. There’s a case for adding a “rich text field” to the list of standard form controls and user agents could display a default toolbar for standard editing tasks.

What do you think? Should the rich text field become part of the standard, or should we rely on JavaScript to sort it out? Is a rich text editor even the right approach for editing content on the Web?

  • {RainmakeR}

    Don’t even get me started on rich-text editing. We have used FCKEditor (now CKEditor) in our CMS for some time now, which is a decent editor and has loads of features, but sometimes I think it’s perhaps too much power for the end-user? You can disable a load of features, and I recommend this, because by default the user has the potential to completely destroy your nice site layout and painstakingly crafted CSS in one foul swoop. “With great power comes great responsibility.”

    I’m actually thinking for our next CMS iteration, about giving the end-user less control within the WYSIWYG editor itself, and structuring the content a little differently. I’ve seen too many pages which have a postage stamp sized image that seems to take forever to download – because it does, since they have uploaded a 3MB image and then dragged the resize corners on the image within the WYSIWYG to make it 200×200. Because “that’s how I do it in Word”… and really, who blames them? They are just using borrowed convention from tools which these editors are mimicking.
    Yes, mimicking… they have a long, long way to go, and sad to think how long they’ve been around.

  • Personally, I feel the forward movement of web applications and excellent content management systems is being dragged down by the anchor of so called “Rich Text Editors” on the web. You are so right that in-browser implementations fall far short, and JavaScript editors are doing way too much work for not that much benefit. I’m looking forward to a future of semantic content that is beautifully marked up. I’m hoping that the W3C and contentEditable will be the vanguard of this standardized web, so that we don’t have to constantly rely on the prowess of JavaScript developers to keep our heads above water when allowing our users to do something as simple as add content to our sites!

  • My take on this is that profesionnal web editors should learn HTML. Not all of it, but the parts relevant to article editing (including some accessibility guidelines). If writing for the Web or editing content for the Web is a sizeable chunk of your work, you must learn (the relevant parts of) HTML, no excuse.

    To make the workflow a bit faster, tools such as a simple text syntax (Markdown, Textile) can be used, as well as some JavaScript controls (MarkItUp comes to mind). Integration with a media library is important, too.

    Now most small to medium websites, and especially brochure websites, don’t need a lot of editing. So they don’t justify hiring a web editor, or having one person internally taking this role. The owner of this website is generally a business owner with little time to spare. One simple solution is that they have their web agency do all the editing (plan the right budget for that). Another solution is that they have access to a decently usable CMS, with a rich text editor. That one is not a very good solution; it sounds like the cheaper one, but the time the client needs to invest and the quality of the end result mean the real cost could be more than having trained people do the editing.

  • TinyMCE and a few others make a reasonable job of the resulting mark-up, but none are perfect and it’s too easy for a user to trash the page.

    Personally, I prefer markdown syntax if I need to ensure that the resulting HTML is valid.

  • Chris

    I don’t get it. I always thought styling depends on the editor and not the user agent. Because that would be a much smarter solution, wouldn’t it? These editors are based on JavaScript. If you want to have a word styled bold, why doesn’t the editor just wrap it into a <strong> or a <b> – whatever fits semanticly? So the developers of the editor have to make a decision which tag is to be used (inserted, wrapped,…) and that would be applied to the editor – browser independent.
    Or is there a mistake in my reasoning???

  • Inherently it’s flawed – Agree with Craig that TinyMCE is quite good, but the authoring of semantics is a human-driven job.

    Businesses use these WYSIWYG editors as a way to drive down cost, but then end up paying an arm and a leg for developers to fix it.

    Best advice I’ve given for the places I’ve worked is to train the authors up with a few basic HTML concepts, or employ a developer with HTML experience to give it a quick scan.

    Hmmm… idea!

  • Florent, I think the problem is that the vast majority of websites and blogs are run by people who have little or no web expertise and who do not have the budget to get professional web help. *Most* websites do not bring in money, and aren’t designed to. Browsers *should* be robust enough to allow these people to contribute content to the web without creating a complete mess.

    It is possible to limit potential damage caused by users by limiting the options available in RTEs, but I find that you need to put in an additional step and filter everything except a strict selection of tags and attributes. I prefer to do that with PHP, although it can occasionally be a pain to integrate with CMSs.

    I think it’s unrealistic to expect people to learn HTML or markdown before posting content, so RTEs are here to stay, at least until HTML 5 becomes predominant on the web, along with browsers that properly support it. In other words, for a long, long time.

  • mmatsoo

    Are we talking about content actually being created/written in a browser-based rich text editor?

    Where I work, everything is prepared in Word and then it simply gets copied/pasted into the editor. (cue horror music for the resulting horror code) I am struggling with finding the right moment to put my foot down to say, “Everything we are doing is wrong!”

  • Phweeee

    Rich text editors don’t work and shouldn’t be used. End of story. If a client isn’t prepared to learn basic html, something like wymeditor can be used as a compromise.

  • SJH

    I hate rich-text editors with a passion. If I had my own way, none of my clients would have rich-text editing capabilities. Indeed, I’ve even deliberately not given a lot of clients *no* rich-text editor, preferring to wait until they ask for it — my logic being that there’s a chance they won’t miss it.

    On the occasions that a client has specifically requested a WYSIWYG editor, I’ve reluctantly enabled it but with very limited features (no underline, no font selector, no colour selector, etc) in order to try my best to preserve the layout of the page. Some clients have gone one step further and have demanded these additional presentation options. What results is, as expected, a horrible mess of code and centred text in different fonts and colours to those specified in the site’s master stylesheet which have completely compromised the integrity of the lovely designs I’ve come up with.

    Don’t even get me started on the sheer horror of the code resulting from clients pasting directly from Word. Not an unreasonable thing for a client to expect to be able to do, but I’ve had so many calls saying “WHY IS THIS IN A WEIRD FONT? WHY ARE ALL THESE WEIRD SYMBOLS ALL OVER MY PAGE?” that have made me want to tear my hair out.

  • Spocke

    Nice to see an article about the mess that the browser vendors currently produce in the contentEditable field.

    I’m the main developer of TinyMCE and I must say that much of the logic in editors like TinyMCE is focused on resolving these differences and trying to make it as similar as possible. And this takes a lot of code, most users who complain over the size and complexity of TinyMCE doesn’t understand what’s required to get a similar editing behavior across all browsers. Non of the smaller editors handle these issues and produce completely different output depending on browser. This might work fine if you just use it for personal use or for some simple forum but when it comes to CMS systems it’s important that Author A with IE can editor contents that Author B produced with Safari.

    Editors like TinyMCE often get blamed for these quirks when it’s in fact the browser vendors that have been lazy. A simple bug like the ability to select images in WebKit has been open for years. I guess it’s more fun to add 3d rendering capabilities to canvas than fixing bugs.
    We constantly report new bugs regarding the editing in the browser but it seems that they are just ignored unless they are regressions. I feel like they don’t want to fiddle to much with the probably since there are workarounds that might break if they do.

    One of the main problem is the lack of a good specification as stated in this article. HTML 5 doesn’t cover contentEditable in a good way and I don’t think it should. HTML 5 is getting to be a monster specification it would have been better to separate it into smaller specs at an earlier stage and have the editing in it’s own specification. But even if we have a decent specification we still have browsers like IE 6, 7, 8 that currently produce old and deprecated contents like font tags and they will be around for years to come.

  • Anonymous

    The original design goal of the WWW was to allow for information dissemination. The goal was not enabling everyone to share rich-text with each other.

    Since that is now the goal, it will require something new. Have the W3C define a new markup standard, “RML,” loosely based on HTML, but separate from it. It should have narrowly-defined, restrictive rules. Then allow for that markup to be embedded within HTML. Browsers will implement rendering it, and integrate an editor component in their own code so that web sites don’t have to provide their own WYSIWYG environment any more.

    Problem solved.

  • jamiemcd

    I chose xStandard for my custom CMS a few years ago and it does very good at generating clean code, spell checking, and image uploading. I haven’t compared it to new versions of TinyMCE (except for using it as part of my WordPress blog I guess). However xStandard is a browser plugin so there are problems with that approach. It can not be placed in an AIR application, for example, whereas TinyMCE can. The development of xStandard seemed stalled and Mac OS X 10.6 (Snow Leopard) broke the plugin. Their website states that they are working on the issue, but there has been no update since last year.

  • Elaine

    I’m seeing a lot of antagonism towards people who just want to get their words on the web! If you’re just adding an announcement or editing a few words, you don’t want to wait for some agency to do it for you, you want to do it NOW. Additionally, for many people it’s not a sizable part of their job: they are a teacher who also posts to a class website, or a secretary who also maintains part of an intranet, or a volunteer sharing upcoming events.

    I’ve been helping non-designer/developer folks put their content on the web (or intranet) for a long time now in a variety of contexts. Seeing a common pain point, like that 3MB file resized in the rich text editor, is a call to develop either site tools or educational tools.

    I find that if you take the time to listen, to respect folks as skilled and intelligent in their own fields of expertise, and to explain in plain language, that most people can understand the most important parts.

    For example, anybody who’s used Word knows that it does strange formatting things sometimes, so saying “Word is weird, here’s how to work around it” is a pretty decent substitute for “OMG DON’T DO THAT!” (Spocke, I LOVE that TinyMCE includes a “paste from Word” button, even if the cleanup behavior is a little odd sometimes.)

    Locking down the options to something manageable is definitely our responsibility as developers. Every site has different needs, and to be honest, in some cases the authors and site visitors want and expect things that we find hideous. (Pink comic sans FTL.) In other cases, all you need are the most basic formatting options. Knowing the site, the author(s) and the audience is important in making those kinds of choices, the same as any other design or development decision.

    Would I like to see more consistency and quality in code produced through rich-text editors? Absolutely! Do I think it should be part of HTML5? I honestly don’t know. In the meantime, we’ll have to just keep muddling along.

  • jerichvc

    if a web content or web page is posted using Rich text editors, a lot of people won’t “view source”. even my dad won’t care about markups. as long as they get the info they want then that’s. A web content writer using rich text editor is writing for client-end users and not for developers looking for bad html markups.

  • Bobby jack

    @jerichvc: That’s all well and good, but when the result is a page with lines that break in the middle when the text size is increased, or a page that harms the site’s SEO ranking, or a page that many people simply cannot conveniently read, it IS a problem. Nothing to do with “developers looking for bad html markups”.

    This subject is a huge bugbear of mine (in case it wasn’t already obvious), and I wish that people would – first and foremost – get it out of their heads that the web is (or should be) WYSIWYG. HTML is a MARKUP language, and any kind of front-end editor should respect that. That means not providing the ability to hand-style fonts, not creating empty paragraphs (or paragraphs containing a single non-breaking space!), and not putting in a bunch of line-breaks everywhere.

    To a great extent, this is about education. If someone couldn’t punctuate their copy properly, would you accept that, or would you teach them some very basic concepts of when to use a comma, when to use a question-mark, etc? HTML is barely anything more than that.

  • What’s the editor that WordPress uses? It seems to do a pretty decent job of removing the garbage code from the posts. That being said, sometimes the removal of “garbage code” results in removing the code we actually wanted to keep in there. Nothing is perfect but I find their editor to be the closest I’ve found thus far.

  • We’ve all seen what happens when you allow unfettered use of WYSIWYG editors by those with little or no knowledge of the underlying (i.e., generated) HTML. It’s not the fault of the user, either – how do they know not to use the tab key or spaces for indentation, or to unravel text formatting in the opposite steps it was created to avoid deep nestings of span, b, em, span, font, em, b, span, etc. that hardly ever do what they intended?

    I viewed the work of two different people using GoDaddy’s (absolutely terrible) WebSite Tonight product. One person created a nearly impenetrable quagmire of deeply nested tags, while the other created a clean, though rudimentary, brochure-type web site. The former was a marketing exec with some design background who simply kept hitting buttons and the space bar until the page looked close to what she wanted. The second was a former mechanical engineer now running a kitchen and bath design company who worked methodically through the design. Absent at least a sense of HTML and page structure, it could have gone differently for either of them.

    We use a limited capability JS-based editor to at least generate valid HTML and offer a tutorial on what works and what doesn’t so that our customers can produce reasonable HTML segments for their course descriptions and e-mail notifications. We believe that you really do have to offer both – a restricted editor that generates valid HTML and some guidance in its use.

  • Pac Ocean

    I think many confuse writing, or typing, with desktop publishing.

    “Is a rich text editor even the right approach for editing content on the Web?”
    I think no.

    In fact, unpopular as it may be, I support everyone learning HTML, CSS, and 3rd grade outlines. Ie. for the Internet, change the way you write. Content logical with minmal format cues.

    For me personally, it’s cheaper in the short run. For “fancier” print stuff, I have FrameMaker 8 and a image editor PhotoImpact X3.

    For everyday stuff, plain text or HTML. I try to avoid the middle road of wordprocessors like MS Word or OO Write as much as possible. I do use OO Calc and MS Excel, but for the web, I send PDF; I have Acrobat Pro 8.

    Back in the days of Acrobat 4, I read a book about the web, suggesting either PDF or markup. I agree.


    PS. For many properly implemented manual CSS docs, the markup tags may be viewed as lines around boxes on printed forms. At least to me.

  • bryce-m

    Hey I was just wondering how to access the wysiwyg editor features of browsers. I don’t recall hearing of it before.

  • Bobby jack, while you are technically quite right, that’s not what people want the web to be. So, we have to find ways to work around it. Things like disabling large sections of TinyMCE (for example) so that users can’t select fonts, colors and so on definitely help. But there’s still nothing you can do to make sure that headings are properly nested (well, you could probably write a script to give warnings, but I’m not sure it would really help).

    We have to accept that users want to be able to style text (and, really, they should be able to, within limits) and that they want to see the way it’s going to be. To hope that they will all learn HTML is an absurd hope, because it never, ever will happen. As developers, we have to find ways to get around the problems.

  • Hmm… Maybe I just don’t know enough about how RTF editors work, but why do the browsers have anything to do with how an RTF editor works at all? I thought that the editor just used Javascript to generate the code. Not the browser.

  • nathany

    Even when options for fonts/colors are disabled in the editor, people still paste directly from Word. In terms of satisfying developer/designers without people learning code, I think WYMeditor is on the right track.

    Other options might be:
    – upload a set of files (Word, JPG) and parse basic formatting
    – a custom desktop app for writing RTF content as the only synchronize to the CMS
    – a combination of MarkItUp+Coderay for Markdown

  • smftre

    Rich text on the web is dangerous. It all really depends who is using it, I have had to implement different control levels depending on the end user(s) I am creating my custom CMS’s/Systems for.
    I would love to see a 100% brilliant working idea for my end users to not be able to mess up, so I suppose I will just keep waiting for a few more years then..

  • Pac Ocean

    If rich text is important and particularly pagination (at least until CSS 3 is in all browsers), use PDF.

    I think many people who don’t do their own typing refuse to accept that there are better things to spend money on than a wordprocessor.

    I guess they don’t see as it limiting and more expensive. Email should have made it clear.

    HTML and XML ‘should have’ / ‘could have’ made it much cheaper to reuse data. WYSIWYG continues to retard and cost the everday user.

    I guess it’s not popular to be frugal by becoming less dependant.

    Had my say. Thanks.

  • bsutton

    I have also spent much time thinking about this issue. I have built a custom cms and use xinha as the editor, chosen at the time by its ability to turn facilities off and add custom styles.

    I wish popular wordprocessors would separate style/layout from content too, and be more like What You See Is What You Mean. Then if that can be taught at schools etc there would be much productivity gain! Pasting from such a wp to a web document would be so easy with great for SEO.

  • iDude (Chaz Scholton)

    This topic gets me fired up in bad ways like it does many other people. Many of my rants about it are similar to those things already posted. So, I’m going to attempt to pose some solutions to this madness. I really got feed up with web based Rich Text editors for wordpress, so I explored Blogging Application software. While it’s not perfect, I’ve settled upon Microsoft Livewriter for the time being. it has issues in the HTML it generates, it’s not perfect. However, what I love about it, is that it will take Site CSS and use it. Anyways, It’s important for any Editor to take the CSS into consideration. Ask, me the site CSS should dictate a lot of what is available or not in an editor. CSS should be considered just as much as HTML is. I do know when I click on the B or I options, I expect the html Bold or Italic tags to be used. Not Strong or some other hair brained notion of this most basic level of html. BOLD is BOLD not Strong! The user interfaces that say BOLD and produce STRONG is lieing to me and everybody else. Dugh! Where is the common sense? Are Geeks really this stupid? This is 2010 and some Geeks still are lacking common sense in the software they produce. Worse yet other Geeks with Common sense have to suffer along with people that just want to get in and do tasks. Anyways, CSS needs to be brought into the picture. This way what ever the style of the site has been established by the designer, it dictates the Style of the content being displayed and edited. True What you see is what you get. Some of us attempt to lock down the features of the editors and give people the basics, and the code generated still sucks! The Editors can’t even get basic HTML right. Paragraphs should be paragraphs that what the p tag is for. Break tags should be just that. This ain’t rocket science. Now, when it comes down to Word Doc or other data being pasted into the Editors. Conversion needs to occur upon the paste event, or right before the past event is completed. Conversion to HTML x.x standard using whatever CSS that has been set for the website. Convert, Convert, Convert to whatever standards that are being used and established. The world does not revole around web applications alone. People use Open Office, MS Office and other applications too.. People just want to do things without having to fight with technology! Technology should be easy to use and make things easier and simple. It’s 2010 and stupidity still rules. WYSIWYG is there to make things easier for people to do, without disrupting work flow… people want to write documents not program them. There are a lot of Geeks that lack common sense and trying to dictate to other people to learn a lot of BS to perform basic tasks. Javascript is already being tasked with doing a lot of things that it should not be bothered with. I’m happy to see some of the features of HTML 5 come to light, ask me it’s 10 years over due.. some of this stuff should have been made standardized years ago. adding attributes to form fields such as “required” or the additional types of input really is common sense, and it these common sense things had been included years ago, there would not have been all the crazy need for the repeative Javascript libs to handle these basic, simple yet complex issues. I just love to see how some Geeks proclaim that this or that was never intended to do something. That’s the same mindset that makes everything so damn hard to work with. More difficult for us Developers and more difficult for the End users. I’m pretty certain the Wheel was not invented to be used on Automobiles either… because automobiles were not around at the time. I’m pretty certain wheels were not invented to be tires on a rim either. I dispise this stone age mindset some programmers or developers have in regards to the matter. Being overly protective of software that basically sucks. Sorry, I had to say it. Improvements need to be made in a number of areas and too many people are holding onto out dated notions, thoughts and ideas…

  • I think it should be possible to write a rich text editor in JavaScript, without using the rich-text facilities provided by the browser. But last time I checked, getting the cursor and the selection anchor positions inside a div is not any less convoluted than what you just described… quite to the opposite (unless working inside a textarea – which is exaclty what we’re trying to avoid)… So I left it there. But ideally, I think this is the way to go.
    It was impressive to learn that the first or so web browser (from around 1990) already had text editing capability. Sadly, and unexplainably, it seems to have hardly evolved since in comparison to the other web technologies.

Get the latest in Front-end, once a week, for free.