Counting with an Arbitrary Character Set

Share this article

Something small and uncontroversial this week, as we look at a simple yet flexible technique for counting with an arbitrary character set. It’s not something you’re likely to need very often; but when you do, you’ll find that none of JavaScript’s built-in functions are quite designed to handle it. JavaScript does have built-in functions for parsing and converting numbers between different numerical bases. For example, the parseInt method can work with any radix (numerical base) from 2 to 36, and is commonly used for number conversion and counting in non-decimal bases. The Number.toString method can reciprocate, converting decimal numbers back to non-decimal number-strings:

var character = "2F";
alert(parseInt(character, 16));    //alerts 47

var number = 47;
alert(number.toString(16));        //alerts "2F";
But what if you wanted to count using Klingon numerals? Or more likely perhaps, using Greek or Cyrillic letters, hieroglyphics, or some kind of runes? The technique I’m going to demonstrate can do exactly that, in any numerical base; and to illustrate this fully, I’ll show you some examples of working with upper-case Greek letters in hexadecimal (base 16).

It’s All in the Lexicon

So the very first thing we need to do is define a lexicon, which is a dictionary of the characters we’ll be using, defined as a single string of unicode escape-sequences. In this case, we have 16 upper-case Greek letters, from Alpha to Pi — each digit is represented by a letter, and the length of the overall string determines the numerical base:
var lexicon = "u0391u0392u0393u0394u0395u0396u0397u0398u0399u039au039bu039cu039du039eu039fu03a0";

An Escape Sequence is One Character

It’s worth noting that, even though it takes six typed-characters to define a unicode escape sequence, it still only shows up as one character in the string, and therefore the lexicon is 16 characters long.
Once we have the lexicon, we can refer to a character by numerical index using String.charAt, and conversely, get the numerical index of a character using String.indexOf:
var number = lexicon.indexOf("u0398");    //the decimal equivalent of "Θ" 

var character = lexicon.charAt(7);         //the character equivalent of 7
So any computations we do will be based on those two methods. For example, let’s define a for
-loop that runs for "Κ" iterations, and lists each character in-between:
var str = "";
for(var i=0; i<lexicon.indexOf("u039a"); i++)
{
    str += lexicon.charAt(i) + "n";
}
alert(str);
But what about larger numbers, say, displaying the character equivalent of 23? We simply have to extract the individual digits, and then grab the character equivalents, in this case 2 and 3:
var target = 23;

var conversion = lexicon.charAt(Math.floor(target / 10))
               + lexicon.charAt(target % 10);

alert(conversion);
Just to make things really interesting, what if the number we want to convert contains letters as well as numbers, such as the hex number "2F"? In that case we’d have to convert each digit individually, because we can’t refer to a character by hexadecimal index (ie. lexicon.charAt("F")
would have to become lexicon.charAt(15)):
var target = "2F";

var conversion = lexicon.charAt(parseInt(target.charAt(0), 16))
               + lexicon.charAt(parseInt(target.charAt(1), 16));

alert(conversion);
Of course, the last two examples are fairly simplistic, because the number of digits is known; but it wouldn’t be difficult to adapt the process to iterate through as many digits as the number contains. All the components you need are here, it’s just a case of adapting them for your precise requirements.

It’s the Data That counts!

As it happens, you can use exactly the same approach to count using normal Latin numerals and letters, should the need arise. And the extensible nature of the lexicon means you can use it to extend JavaScript’s native abilities into radixes greater than 36, with whatever symbols seem appropriate at the time. Or maybe just to develop some funky clocks!
note:Want more?
If you want to read more from James, subscribe to our weekly tech geek newsletter, Tech Times.

Frequently Asked Questions (FAQs) about Arbitrary Character Sets

What is an arbitrary character set?

An arbitrary character set is a collection of characters that is not restricted by any specific rules or standards. It can include any character from any language, symbols, numbers, or special characters. This flexibility allows for a wide range of possibilities when it comes to creating unique strings of characters, such as passwords or codes.

How can I create a password with an arbitrary character set?

Creating a password with an arbitrary character set involves selecting characters from a wide range of options. You can include letters (both uppercase and lowercase), numbers, symbols, and special characters. The key is to create a combination that is unique and difficult to guess. Using a password generator can help ensure randomness and complexity.

What is the significance of counting with an arbitrary character set?

Counting with an arbitrary character set is a concept used in computer science and cryptography. It allows for a vast number of possible combinations, making it ideal for creating unique identifiers, encryption keys, and secure passwords. The larger the character set and the longer the string, the more potential combinations there are, increasing security.

How does counting with an arbitrary character set work?

Counting with an arbitrary character set works similarly to counting in any number system. Each character in the set is assigned a unique value. When counting, you start with the first character and proceed through the set. When you reach the end of the set, you roll over to the beginning and increment the next position, much like how you would roll over from 9 to 0 in decimal counting.

Can I use special characters in an arbitrary character set?

Yes, special characters can be included in an arbitrary character set. This includes characters such as punctuation marks, mathematical symbols, and other non-alphanumeric characters. Including special characters increases the number of possible combinations, enhancing the security of any strings created using the set.

What is the difference between a standard and an arbitrary character set?

A standard character set, such as ASCII or Unicode, is a predefined set of characters that follows specific rules and standards. An arbitrary character set, on the other hand, is not bound by any rules or standards. It can include any character, allowing for a much wider range of possibilities.

How can I implement counting with an arbitrary character set in my code?

Implementing counting with an arbitrary character set in your code involves creating a function that can iterate through the character set and increment the count. This function should be able to handle rollovers when the end of the set is reached. Various programming languages offer different ways to implement this, so the exact code will depend on the language you are using.

What are the potential applications of arbitrary character sets?

Arbitrary character sets have a wide range of applications, particularly in fields like computer science and cryptography. They can be used to create unique identifiers, generate secure passwords, and develop encryption keys. They are also used in coding and programming for a variety of tasks.

Are there any limitations to using arbitrary character sets?

While arbitrary character sets offer a great deal of flexibility, they can also introduce complexity. For example, if you are using a character set that includes special characters, you need to ensure that your code can handle these characters correctly. Additionally, the larger the character set, the more memory and processing power may be required to handle operations involving the set.

How can I ensure the security of my arbitrary character set?

Ensuring the security of an arbitrary character set involves several factors. First, the set should be large enough to provide a sufficient number of possible combinations. Second, the characters should be selected randomly to prevent predictability. Finally, any strings created using the set, such as passwords or encryption keys, should be stored securely to prevent unauthorized access.

James EdwardsJames Edwards
View Author

James is a freelance web developer based in the UK, specialising in JavaScript application development and building accessible websites. With more than a decade's professional experience, he is a published author, a frequent blogger and speaker, and an outspoken advocate of standards-based development.

Share this article
Read Next
Get the freshest news and resources for developers, designers and digital creators in your inbox each week
Loading form