This is the editorial for the July 25th edition of the SitePoint PHP newsletter.
More from this author
Why is a string called a string? Have you ever given this some thought? We never use such a word in contexts other than programming for a set of letters sticking together, and yet – in programming it’s as pervasive as the word “variable”. Why is that, and where does it come from?
To find out, we have to tackle some related terms first. History lesson time!
The word font is derived from the French fonte – something that has been melted; a casting. Given that letters for printing presses were literally made of metal and smelted at type foundries, that makes sense.
The terms uppercase and lowercase refer to the literal part of the case in which the font was transported. So the printer (person) had a heavy case he lugged around or had set up at a printing press, and in this case were two “levels” – an upper case, and a lower case. The upper case contained only – you guessed it – UPPERCASE letters, while the lower case only contained lowercase ones.
You’ll notice that there were more lowercase letters than uppercase ones. This was to be expected – a letter could only be used once on a single page and after all, a written body of text will have many more lowercase letters than uppercase ones, as there was no such thing as Youtube comments and CAPS LOCK yet.
So how does all this relate to strings?
Well, as printing became more mainstream and printing presses began offering their services to individuals, not just newspapers and publishers, it is said they decided to charge based on the length of the printed material – length in feet. Granted, a lot of this is speculative, but if they strung together the produced, printed material, they could easily estimate the costs and bill customers. So we can conclude with a reasonable degree of certainty that they used the word string in this context as a sequence of characters.
Edit, July 26th 2017: As pointed out in the comments below, it seems that there was an actual string in use to tie the character blocks together as they were transported to the press after being assembled! A Twitter follower even sent me the following video, demonstrating the process!
Still, how does this relate to the programming field? I mean, you could say a string of anything in regards to anything at all and it would make a degree of sense in the non-programming world. It’s just a word that can be applied generally quite easily to things, even though it generally isn’t.
What if we look across academia for first references?
In 1944’s Recursively enumerable sets of positive integers and their decision problems we have a mention that could vaguely resemble the modern definition:
For working purposes, we introduce the letter 6, and consider “strings” of 1’s and b’s such as 11b1bb1.
In this paper, the term refers to a sequence of identical symbols, so a string of 1’s or a string of b’s. Not exactly our definition but it’s a start.
Then, a full 14 years later, in 1958’s A Programming Language for Mechanical Translation, the word is used thusly, and only once:
Each continuous string of letters between punctuation marks or spaces is looked up in the dictionary.
Okay, kiiiind of similar to our notion of strings, but it seems like he’s just describing, well, words. Obviously, that cannot apply – it’s too generic. For some reason, though, it seems to have stuck.
In 1958’s A command language for handling strings of symbols, the word string is used in exactly the same way we use it today, albeit not defined as such.
We find one more reference in 1959, The COMIT system for mechanical translation:
If we want to replace D SIN(F) by COS(F) D (F), where F is unrestricted and may be any arbitrary sequence of constituents, we use the notation $ to stand for this string.
Interesting! Here’s the dollar sign we all know from PHP, and which was (is?) actually the string symbol in BASIC.
Again in 1959 we have a more direct definition in The Share 709 System: Machine Implementation of Symbolic Programming
The text is a linearly ordered string of bits representing the rest of the information required in the loading and listing processes.
In fact, it was through ALGOL in April of 1960 that string seems to have taken its modern-day shorthand form “string” (up until then people said string of [something]). See this paper’s abstract.
Then finally, in May 1960, the Report on the Algorithmic Language Algol 60 mentions it in a form that hits home.
From there, it just takes off like a modern day meme.
In 1963 METEOR: A LISP Interpreter for String Transformations goes with the rather unspecific “[…] but certain simple transformations of linear lists (strings) are awkward to define in this notation.”.
In 1964, On declaring arbitrarily coded alphabets mentions “character strings”.
Searching ACM reveals a bunch of other resources in the 60s and later which all now use the term regularly, so it seems the 60s were a catalyst in the term’s evolution and made it what it is today, slowly, through the needs of the systems it found itself in. Kind of funny that it ended up representing a similar concept as in the printing press days – a set of characters which has a meaning and carries with it some costs (only this time, in memory).
So now we know – or at least think we know – where string comes from. Computer science has always been a dark space of mysteries and slow evolution, and just like we now know that the human eye has had half-stages and semi-eyes in its past, so too have terms in computer science evolved past and around their original meaning, until they gave us what we have today. The 1960’s have, in various locations all at once, given birth to the same concept with the same name, until it evolved into one unified term that we all understand and use today and, most importantly, can agree on.
When you think about it, was there a better word we could have used? While string hardly feels natural due to the complete detachment from a similar term in the “real world” (we don’t call words on a book’s page “strings”), I fail to think of any term which would fit this popular data type better. Can you? Let me know.