Line endings in Javascript

    Simon Willison
    Share

    I spent much of today fighting with line endings in Javascript, and eventually turned up some results which are well worth sharing – if only to save other developers from descending in to the same debugging black hole.

    As you may know, the humble line break actually has three forms depending on which operating system is doing the breaking. On Unix machines, a single newline character ‘n’ does the job. On Macs, a carriage return ‘r’ is used. DOS and Windows use both: ‘rn’. It’s one of those relatively subtle issues that can bite you hard if you don’t know what to look out for.

    Today, I was tasked with the simple problem of building a Javascript function to turn single newlines in to double newlines within a textarea. My first attempt looked like this:


    var doublenewlinesRE = /([^n])n([^n])/g;
    function doublenewlines(obj) {
    obj.value = obj.value.replace(doublenewlinesRE, "$1nn$2");
    }

    Double newlines

    The above code uses a simple regular expression which finds all instances of something that is NOT a newline, followed by a newline, followed by something else that isn’t a newline. Instances of this pattern are then replaced by the same pattern with two newlines in the middle instead of one.

    This worked fine in Firefox on both Windows, Linux and Mac because Firefox treats newlines as ‘n’ no matter what platform it runs on. It broke on IE for Windows and IE for Macintosh because those browsers use ‘rn’ and ‘r’ respectively.

    Fair enough. The usual solution to this problem is to normalise the line endings before running the conversion, by replacing each of the three combinations with the single ending of your preference (in my case ‘n’). Here’s my second attempt at the function:


    function doublenewlines(obj) {
    obj.value = obj.value.replace(/(rn|r|n)/g, 'n');
    obj.value = text.replace(doublenewlinesRE, "$1nn$2");
    }

    That didn’t work either. After much head scratching, debugging and poking around with alert boxes I finally uncovered an undocumented and almost mind numbingly obscure “feature” of Internet Explorer: When you assign a string to the value attribute of an input object, IE silently converts your nice ‘n’ line endings to the platform preference. Microsoft’s documentation fails to note this, but I’ve confirmed that this happens on both Windows and Mac versions of Internet Explorer.

    Bizzarely, if you assign to the value attribute of a hidden form field object no conversion takes place; the line endings are only changed if you assign to a text area.

    The following code, although seemingly identical in function to the code just listed, does exactly what I want it to do:


    function doublenewlines(obj) {
    var text = obj.value;
    text = text.replace(/(rn|r|n)/g, 'n');
    obj.value = text.replace(doublenewlinesRE, "$1nn$2");
    }

    This works fine because the normalised version is assigned to a variable rather than being assigned directly to the textarea object’s value attribute – hence IE’s automagical line ending conversion is delayed until the end of the script and fails to play havoc with my second regular expression.

    Finally, a note on style. If I’d been thinking about code reuse rather than working quickly to solve a problem, I would probably have come up with something like this:


    function doublenewlines(text) {
    text = text.replace(/(rn|r|n)/g, 'n');
    return text.replace(doublenewlinesRE, "$1nn$2");
    }

    Double newlines

    Although it requires a bit more code in the onclick handler, abstracting away just the string operation I would have completely avoided the weird line ending conversion problem. Still, at least I’ve come away with understanding of another of IE’s little quirks.