Counting with an Arbitrary Character Set

Something small and uncontroversial this week, as we look at a simple yet flexible technique for counting with an arbitrary character set. It’s not something you’re likely to need very often; but when you do, you’ll find that none of JavaScript’s built-in functions are quite designed to handle it.

JavaScript does have built-in functions for parsing and converting numbers between different numerical bases. For example, the parseInt method can work with any radix (numerical base) from 2 to 36, and is commonly used for number conversion and counting in non-decimal bases. The Number.toString method can reciprocate, converting decimal numbers back to non-decimal number-strings:

var character = "2F";
alert(parseInt(character, 16));    //alerts 47

var number = 47;
alert(number.toString(16));        //alerts "2F";

But what if you wanted to count using Klingon numerals? Or more likely perhaps, using Greek or Cyrillic letters, hieroglyphics, or some kind of runes? The technique I’m going to demonstrate can do exactly that, in any numerical base; and to illustrate this fully, I’ll show you some examples of working with upper-case Greek letters in hexadecimal (base 16).

It’s All in the Lexicon

So the very first thing we need to do is define a lexicon, which is a dictionary of the characters we’ll be using, defined as a single string of unicode escape-sequences. In this case, we have 16 upper-case Greek letters, from Alpha to Pi — each digit is represented by a letter, and the length of the overall string determines the numerical base:

var lexicon = "u0391u0392u0393u0394u0395u0396u0397u0398u0399u039au039bu039cu039du039eu039fu03a0";

An Escape Sequence is One Character

It’s worth noting that, even though it takes six typed-characters to define a unicode escape sequence, it still only shows up as one character in the string, and therefore the lexicon is 16 characters long.

Once we have the lexicon, we can refer to a character by numerical index using String.charAt, and conversely, get the numerical index of a character using String.indexOf:

var number = lexicon.indexOf("u0398");    //the decimal equivalent of "Θ" 

var character = lexicon.charAt(7);         //the character equivalent of 7

So any computations we do will be based on those two methods. For example, let’s define a for-loop that runs for "Κ" iterations, and lists each character in-between:

var str = "";
for(var i=0; i<lexicon.indexOf("u039a"); i++)
{
    str += lexicon.charAt(i) + "n";
}
alert(str);

But what about larger numbers, say, displaying the character equivalent of 23? We simply have to extract the individual digits, and then grab the character equivalents, in this case 2 and 3:

var target = 23;

var conversion = lexicon.charAt(Math.floor(target / 10))
               + lexicon.charAt(target % 10);

alert(conversion);

Just to make things really interesting, what if the number we want to convert contains letters as well as numbers, such as the hex number "2F"? In that case we’d have to convert each digit individually, because we can’t refer to a character by hexadecimal index (ie. lexicon.charAt("F") would have to become lexicon.charAt(15)):

var target = "2F";

var conversion = lexicon.charAt(parseInt(target.charAt(0), 16))
               + lexicon.charAt(parseInt(target.charAt(1), 16));

alert(conversion);

Of course, the last two examples are fairly simplistic, because the number of digits is known; but it wouldn’t be difficult to adapt the process to iterate through as many digits as the number contains. All the components you need are here, it’s just a case of adapting them for your precise requirements.

It’s the Data That counts!

As it happens, you can use exactly the same approach to count using normal Latin numerals and letters, should the need arise. And the extensible nature of the lexicon means you can use it to extend JavaScript’s native abilities into radixes greater than 36, with whatever symbols seem appropriate at the time.

Or maybe just to develop some funky clocks!

note:Want more?

If you want to read more from James, subscribe to our weekly tech geek newsletter, Tech Times.

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • Navanax

    Hate to add “controversy”, but decimal 23 is hexadecimal 17, not hex 35. Decimal 35 is hex 23. :)

    • http://www.brothercake.com/ James Edwards

      Nothing controversial about it — my bad, I apologise. I’ll change that part of the post to correct the error.

  • Colin Mitchell

    You should have produced actual working examples after each part of your discussion, to make it clear exactly what you are achieving.

  • Steve Clay

    The author might also enjoy one of Google’s code jam practice problems: Alien Numbers, where the program must convert numbers between systems with up to 94 digits!