The power of String.prototype.split() … almost

Share this article

If you feel you’re not getting enough respect as a web developer, here’s a nice pie [profanity warning – don’t click if you’re easily offended] to throw at people.

Actually think the “time spent wishing a slow painful death on Bill Gates” segment needs expanding – Bill isn’t directly to blame. In fact it would be great if the IE team could be more forthcoming and put names to features, so we know exactly who to swear at: “Hi, I’m [insert name] and I’m the guy that put an undefined value at end of your array, every time you leave that trailing comma, resulting in bugs that will keep you amused for hours :)”.

A little bitter at the moment after getting stung by this special while playing with a Javascript version of this. Despite all things AJAXy, writing cross browser code still feels like flying blind. Allow me a moment of complaining…

From the spec (p103 / 104);

If separator is a regular expression that contains capturing parentheses, then each time separator is
matched the results (including any undefined results) of the capturing parentheses are spliced into the
output array. […]

In fact this behaviour is nothing special to Javascript.

For example Perl…


use Data::Dumper;
print Dumper(split(/(:)/, 'a:b:c'));

…output…

$VAR1 = 'a';
$VAR2 = ':';
$VAR3 = 'b';
$VAR4 = ':';
$VAR5 = 'c';

…and PHP…


print_r(preg_split('/(:)/', 'a:b:c', -1, PREG_SPLIT_DELIM_CAPTURE));

…output…

Array
(
    [0] => a
    [1] => :
    [2] => b
    [3] => :
    [4] => c
)

…and Python…


import re
print re.compile('(:)').split('a:b:c')

…output…

['a', ':', 'b', ':', 'c']

In Javascript this might have been as easy as…


alert( "a:b:c".split(/(:)/) );

…which in Firefox (with help from Firebug) gives me;

["a",":","b",":","c"]

Likewise Opera 9 does the right this. But in IE (6)…

a,b,c

Where I my captured seperators!.

As Simon put it;

Why is this a big deal? Because it suddenly makes writing simple parsers and tokenisers a whole heck of a lot easier.

Actually blaming the IE Team is probably unfair – this seems to be a “feature” delivered by the JScript team and appears to have crept into JScript.NET as well, for example with a script like split.js containing;


import System.Windows.Forms;
MessageBox.Show("a:b:c".split(/(:)/));

I can compile it with the jsc compiler in DOS like D:js> C:WINDOWSMicrosoft.NETFrameworkv2.0.50727jsc.exe /nologo split.js then run the output split.exe to get exactly the same – a,b,c. Sigh.

Anyway – more on that lexer some other time (managed to work around this eventually). BTW, if you need something for serious parsing in Javascript (although Moz only) have a look at this compiler generator in Javascript.

Harry FuecksHarry Fuecks
View Author

Harry Fuecks is the Engineering Project Lead at Tamedia and formerly the Head of Engineering at Squirro. He is a data-driven facilitator, leader, coach and specializes in line management, hiring software engineers, analytics, mobile, and marketing. Harry also enjoys writing and you can read his articles on SitePoint and Medium.

Read Next
Get the freshest news and resources for developers, designers and digital creators in your inbox each week