Sophisticated Sorting in JavaScript
JavaScript’s sorting mechanism is a model of simplicity, yet bubbling under the surface is some incredibly flexible and powerful functionality. Using sort
it’s possible to organise arrays, not just into alphabetical or numerical order, but into any bespoke arrangement we can express in conditional logic.
How the sort function works
→ If you already know the basics then you might want to skip ahead.
If sort()
is called with no argument, then the array is sorted lexicographically — in dictionary order, when each value is treated as a string:
var letters = ["R","O","F","L"];
letters.sort();
alert(letters); //produces ["F","L","O","R"]
Otherwise the argument to sort
is a comparison function, which defines the sort behavior according to how it returns. The comparison function itself accepts two arguments, usually referred to as a
and b
, which represent the two values being compared in each operation. Then:
- if the function returns less than zero, sort
a
beforeb
- if the function returns greater than zero, sort
b
beforea
- if the function returns zero, leave
a
andb
unchanged with respect to each other
The JavaScript specification refers to the first sort-condition as sort
. But what this actually means is, “sort b
to a lower index than a
b
lower down the list than a
”, which in terms of numerical indexing is a higher, not lower index. It’s using the word “index” in a very confusing way; how I’ve expressed the conditions above should hopefully be much clearer.
So the normal way of using the comparison function, is to perform and return a simple evaluation which produces the desired sort. For example, if the function returns (a - b)
, then that will produce a numerical sort:
var numbers = [8,5];
numbers.sort(function(a, b)
{
return a - b;
});
alert(numbers); //produces [5,8]
We can work through that with value examples: since a = 8
and b = 5
, then (a - b) == 3
; three is greater than zero, so b
will be sorted before a
, producing the order [5,8]
.
So inverse-numerical order can be produced simply by reversing the equation:
var numbers = [4,3,5,9];
numbers.sort(function(a, b)
{
return b - a;
});
alert(numbers); //produces [9,5,4,3]
We can also create a comparison function that produces a dictionary sort, by defining three comparisons to evaluate each pair of strings — in computational terms, "a"
is less than "b"
, so we can directly compare the strings like that, to then return one of the three sorting values:
var letters = ["R","O","F","L"];
letters.sort(function(a, b)
{
var x = a.toLowerCase(), y = b.toLowerCase();
return x < y ? -1 : x > y ? 1 : 0;
});
Note how we pre-convert each of the strings to lower-case, which ensures we get a case-insensitive sort (if we didn’t do that then lower and upper-case letters would be sorted separately). We also assign the results of those operations to new variables, because some browsers object to having the arguments over-written.
Multi-dimensional sorting
If a
and b
are themselves arrays, well, directly comparing arrays using mathematical evaluation won’t produce the results we want; but we can compare their inner values and do the sort with them. This is how we sort multi-dimensional arrays, using a value from each inner array as the sort criterion. All the other inner values just ‘come along for the ride’, as it were, and this way we can sort arrays containing a mixture of values. The following example will sort the matrix by the number of sides on each shape:
var shapes = [
[5, "Pentagon"],
[3, "Triangle"],
[8, "Octagon"],
[4, "Rectangle"]
];
shapes.sort(function(a, b)
{
return a[0] - b[0];
});
Multi-criteria sorting
If we can sort multi-dimensional arrays using only one of the values, couldn’t we also sort them using both their values, as independent criteria? The answer of course is, yes we can, simply by adding further conditions to the logic inside the comparison function. For example, use value [0]
for primary sort, but if the two values are equal, then use value [1]
for secondary sort. The following example uses shapes again, sorting first by the number of sides, and then by the alphabetical name of the shape, if the number of sides is equal:
var shapes = [
[4, "Trapezium"],
[5, "Pentagon"],
[3, "Triangle"],
[4, "Rectangle"],
[4, "Square"]
];
shapes.sort(function(a, b)
{
if(a[0] === b[0])
{
var x = a[1].toLowerCase(), y = b[1].toLowerCase();
return x < y ? -1 : x > y ? 1 : 0;
}
return a[0] - b[0];
});
The principal can be extended as far as we need it to go — if the primary test is equal, then sort by the secondary test; if the secondary test is equal then sort by the tertiary test; and so on, for as many points of comparison as we have.
Sorting arrays of objects
As the comparisons get more complex, it’s best to abandon the use of multi-dimensional arrays, and favor instead the use of arrays of object-literals. This makes it easier to see what’s going on in the comparison function, simply because we have intuitive names for the criteria. A neat example of this can be seen in the CSSUtilities library, which parses document CSS to create its own collection of rule-objects.
The overall rules collection is stored as an array, and each of its members is an object with properties like specificity
(the “strength” of the rule as determined by its selector and inheritance context), index
(the overall position of the rule within the rules collection), and depth
(a numerical value for inherited rules that indicates the depth of the inheritance chain, ie. a rule inherited from <html>
would have a value that’s greater (by one) than a rule inherited from <body>
). The specificity
itself is also an array of four independent values, one for each of the specificity categories (see Calculating a selector’s specificity in the CSS3 specification, for details).
So how do we sort the rule-objects, considering all those values, to get an array that falls in absolute order of specificity? The first thing, of course, is to have a clear sense of the rules we’re trying to implement:
- sort on specificity if the values are not equal:
- sort on the first category if the values are not equal
- else sort on the second category if the values are not equal
- else sort on the third category if the values are not equal
- else sort on the fourth and final category
- else sort on index if the values are not equal
- else sort on inheritance depth
And then it’s just a case of expressing that in code:
rules.sort(function(a, b)
{
if(a.specificity.toString() === b.specificity.toString())
{
if(a.index === b.index)
{
return b.depth - a.depth;
}
return a.index - b.index;
}
if(a.specificity[0] !== b.specificity[0])
{
return a.specificity[0] - b.specificity[0];
}
if(a.specificity[1] !== b.specificity[1])
{
return a.specificity[1] - b.specificity[1];
}
if(a.specificity[2] !== b.specificity[2])
{
return a.specificity[2] - b.specificity[2];
}
return a.specificity[3] - b.specificity[3];
});
The logic has been jigged around a bit, so that some of the rules are expressed as inverse conditions; this is to improve the efficiency of the function, so it takes less code to implement and it returns as soon as possible. There’s probably several different ways of coding the same conditions.
A note about stable sort
The only real issue with this technique is the question of stable sort, which means — if a
and b
are the same then they don’t change with respect to each other. The problem is that stable sort is for sortable values themselves; but in these examples, a
and b
are not themselves the values we’re evaluating for the sort, they’re merely containers for the values that are. Therefore, a stable sort cannot be guaranteed, and what actually happens will vary in different browsers (some will leave them, some will move them)
Personally, I’ve never found a situation in which this is significant. But if you do, the way to prevent it is to make sure that no two sortable objects are ever exactly the same. For example, you could assign a numeric index property to each of the objects you’re sorting, reflecting their initial order in the array. Then in your comparison function, add a final condition for when all others are equal, that sorts by the value of those indices. Since they reflect the original order and are all unique, they’ll effectively maintain the order whenever no other sort occurs.
Sorted!
The fundamental thing to remember is that the sort comparison function is nothing special or unusual, it’s just another function doing stuff and then returning. You can load external data, create elements for test-rendering, or perform any number of complex operations. As long as the function returns correctly — less than zero, greater than zero, or zero — then there are no specific limitations on what you can do to get there!
Thumbnail credit: [Soren]