DATA SHADOWS
Matter and all else that is in the physical world have been reduced to a shadowy symbolism. … The scientific answer is relevant so far as concerns the sense-impressions interlocked with the stirring of the spirit, which indeed form an important part of the mental content. For the rest the human spirit must turn to the unseen world to which it itself belongs.
ARTHUR EDDINGTON, 1929
The value of a fact shrinks enormously without context.
HOWARD WAINER, 1997
On the evening of October 24, 1962, James Brown and the Famous Flames performed at the Apollo Theater in Harlem. It was a brief show by the standards of today's arena extravaganzas. They performed a dozen songs. None of them were new.
The concert recording was initially shelved. But, pressed by Brown's manager, King Records yielded and produced the album. Live at the Apollo was released the following year. It did incredible business. The album stayed on the Billboard Top Pop Albums chart for over a year, helping launch James Brown to R&B superstardom. Listen to that album today; his voice, the band's syncopated kicks, and the screams of the crowd still thrust you into the energy of that Harlem night. The context that swirls around Live at the Apollo elevates it to legend. It is a time capsule of pure American rhythm and blues, bottled 15 months before Beatlemania landed and swept the United States. Today, we now know how music, and James Brown, soared throughout the 1960s. That knowledge makes this early influential show at the Apollo even more special.
How do we think about the albums we love? A lonely microphone in a smoky recording studio? A needle's press into hot wax? A rotating can of magnetic tape? A button that clicks before the first note drops? No! The mechanical ephemera of music's recording, storage, and playback may cue nostalgia, but they are not where the magic lies. The magic is in the music. The magic is in the information that the apparatuses capture, preserve, and make accessible. It is the same with all information. When you envision data, do not get stuck in encoding and storage. Instead, try to see the music.
A Curious World
When we create a statistical chart, we intuit that there is something magical about arranging data into forms that can be seen. But this notion is incomplete. It misses that data originates in the physical world. A song recording did not materialize from the ether. The song was once sung by a real person, in a real room. Likewise, our craft does not just make the invisible, seen. It makes a past reality real again.
Better data stories result when we recognize the material origins of data. Better data stories result when we appreciate how our mind interacts with the physical environment. When we acknowledge the life that produced data—the real life we see and feel—then we can better comprehend the abstract ecosystems of mathematics, statistics, and data.
Our perspective is anchored to our body and the things it encounters. Early words named objects in the physical world. As we took more notice of how physical things change over time, our language and our consciousness grew. Actions—relationships between people, objects, and environment—were named, too. Human perspective stretched outward to describe invisible social, political, and economic systems. Human perspective stretched inward to account for how these processes make us feel. Together, our experiences of physical and invisible phenomena evolved. Personal mental maps of reality and identity emerged.
…the truth is nothing other than the shadows of artificial things … Take a man who is released and suddenly compelled to stand up, to turn his neck around, to walk and look up toward the light; and who, moreover, in doing all this is in pain and, because he is dazzled, is unable to make out those things whose shadows he saw before. What do you supposed he'd say if someone were to tell him that before he saw silly nothings, while, now because he is somewhat nearer to what is and more turned toward beings, he sees more correctly…
SOCRATES TO GLAUCON, THE REPUBLIC OF PLATO
Our curiosity drives us to achieve better maps of the world because they give us a competitive advantage against nature and against one another. They expand our knowledge into new domains, those of ourselves as individuals and those of our collective society. In some instances, this drive is a motivated search for answers. In others, clarity emerges organically from the chaotic environment. Curiosity sharpens the resolution of our understanding.
Each one of us arrives to data stories with a slightly different map of reality. Nerdy expertise—the kind drilled into scientists, engineers, and designers—has serendipitously prepared some for the technical challenges of the information age. These disciplines gift valuable perspectives and skills. They are uncommon perspectives if one considers the rest of the population: Only six percent of United States workers are in science, technology, engineering, or mathematics occupations. The rare technical orientation of the nerd should not be confused with the attitude of the craft. Data storytelling does not arrive from peripheral obscurity. It is born out of the common everyday experiences that we all share. Data storytelling belongs to everyone.
Numbered
We each get 10 fingers, 10 digits. Because our minds are easily distracted, we use our digits to keep track when we count. Our fingers are a versatile tool for small quantities. They help serve as an easy visual reference for what the count is. But too soon we run out of fingers (and toes) and need to externalize the count beyond our bodies. Externalizing the count also keeps hands available for other tasks. We can scratch quick marks in the dirt to help us keep track of our counting, and just like that, the history of numerals began.
Unary is the base-1 numeral system. The numbers 1, 2, 3, 4 are represented as: 1, 11, 111, 1111
The word for 20, a score, evolved from the Old Norse skor, meaning “to cut”— or how one might scar tally marks into a counting stick. Counting in scores, perhaps more meaningful when shoes were less common, was already archaic by the time Lincoln alluded to 1776 by beginning his 1863 Gettysburg Address fourscore and seven years ago. We still tally game scores on scoreboards.
The Ishango bone is a scarred baboon femur thought to be a 20,000-year-old tally stick.
The first tally marks were scrawled in dirt with a stick or drawn on rock with a piece of charcoal. It soon became useful to preserve these marks for record-keeping and communication. Ancient knotted counting ropes and slashed animal bones survive as examples of preserved counts. In the beginning, every item of the count was represented by a mark. Six is //////. These marks persist in East Asian numeral systems as the first three counting numbers: . These slash-characters are also identical to the same first three numerals in the Brahmi numeral system, the direct graphic ancestor of the modern Hindu-Arabic numerals the world uses today: 1, 2, 3. To us, these familiar numerals are abstract symbols. Today, numerals are squiggly cultural conventions no longer connected to our physical surroundings. But a long time ago, they were.
Tallying numbers becomes cumbersome as one counts higher. Large numbers, often multiples of 10, were abstracted with a new idea: sign values to represent a particular group. These symbols cemented our 10-fingered bodies as the base of the number system. If the number ten is † and hundred is ‡, then 114 can be recorded as ‡†////. Sign-value notation was the basis of Ancient Egyptian and Roman numeral systems. These systems yielded to a variety of additive systems which give special names to the first 10 digits (…four, five, six…) and important multiples of 10 (ten, hundred, thousand). These special names are still how we pronounce numbers in both Chinese and English today: two hundred (and) four.
5, 6, 7, 8, 9
At one time, there was speculation that figures past 4 had come from either the forms of initial letters or syllables of number words of the third century BC Brahmi alphabet. But they may have come from older, untraceable numerical symbols.
JOSEPH MAZUR, 2014
A symbol is a visual shape used as a conventional representation, or proxy, of an object or idea.
Counting numbers struggled to account for expenses that take away more than available (i.e., debts). Another problem was that they could not clearly represent the concept of nothing. Over a thousand years, negative numbers and zero were added to address these issues. Solutions were first formalized in India, synthesized with Greek mathematics in Persia, and then slingshot through North Africa and into Europe by Leonardo fillius Bonacci (nicknamed Fibonacci hundreds of years later). The dominant mental picture of numbers shifted from a count of things to the more abstract number line. The positional notation of the Hindu-Arabic numerals made adding and subtracting easier. These new counting methods powered bank accounting innovations across Europe. Decimal fractions and ever-more abstract concepts, like imaginary numbers, would soon help power the scientific revolution and deliver us into the modern world.
John Tukey advanced a compact base-10 tally system that built a box of dots and dashes with each additional count.
The emergence of numbers shows how the visual memory practice of counting became hyper-externalized and abstracted to even greater benefit. As numbers evolved, they drifted away from physical reality. Today's everyday experience of numbers is surreal.
It requires you to leave your physicality behind and mentally step into an abstract world. But this has not always been so. All of mathematics began with simple vignettes, such as prehistoric shepherds looking across their flocks, counting sheep. It is all rooted in our lived experience.
Many codes … exist primarily to make life easier for machines and their designers without any consideration of the burden placed upon people.
DON NORMAN, 2013
Value types define how data is stored and impact the ways we turn numbers into information. Computer code often demands that certain aspects of value types be declared. Numbers and text can be quickly processed inside the computer's abstract world when they are labeled appropriately. Yet, we must appreciate even more than the computer if we are to build information. Observe the many value types expressed in this statement:
The roar of the crowd swells as Joe Louis, the 198-¾-pound heavyweight, enters the arena for the final fight of the night, hopeful to exit the ring as the champion.
In 1847 George Boole introduced the world to the truth values of Boolean logic and their main operations of AND, OR, and NOT with The Mathematical Analysis of Logic.
A Boolean will record the win-loss outcome. Zero is false. A nonzero, usually one, is true. The heavyweight category is stored as a string of text. This particular category is ordinal because it can be positioned in order. There is a non-arbitrary relationship between weight classes: Heavyweight is heavier than middleweight, which itself is heavier than lightweight. The fighter's name, Louis, is also stored as a qualitative string of text. But it is considered just nominal, as there is no meaningful way of ordering fighter names.
The floating point is able to hold values that arrive from the entire depth and breadth of the number line. It is called a floating point because the value can be re-expressed using scientific notation which moves, or floats, the decimal point.
An integer, a quantitative non-fraction, counts the fights of the night. It is discrete, with no in-between states. A floating point records the fighter's weight. Floats are associated with how we perceive—and measure—the real physical world: a continuous spectrum that can be zoomed in on. Time can be split into seconds, milliseconds, nanoseconds, and so on. Space can also be sliced into ever-smaller fractions of length or degree. Recognizing value types is one foundation for better information because it helps you see the inherent structure in the data's origin.
Enter Data
A datum is a value stored in a location. The value could be of a variety of types, but is often a float, integer, Boolean, or text string. More than one datum makes data. Data is traditionally expressed as the plural of datum. But today we also refer to it as a singular mass noun, like sand or rain. Whether we say data are or data is, each datum includes a value and a storage location.
Scalar data, such as temperature, has one value at each position. Vector data, such as air velocity's direction and magnitude, has two values at each position. As such, it is often represented by an arrow. Tensor data has many values at each position. One example of tensor data is how a stress-strain tensor can differentiate how a material will behave in the three dimensions of space.
In some cases, the data value's location is associated with an actual location in the real world. The location might be global, like a map coordinate, or local, like the position of a stent in a heart's coronary artery. In other cases, the data value's location is defined by reference keys and attribute names that have no relation to a real physical place. Data can also be characterized by how many values are stored for each location.
Just as the value types of data differ, data storage types vary as well. Two-dimensional tables of data position values into neat rows and columns. Hierarchical trees, such as your hard drive's nested folders, stack relationships. Databases manage a variety of data and programs in one unified environment; they create flexible systems sometimes explained with “object” metaphors.
Rote learning and drill is not enough. It leaves out understanding. … ideas and understanding are what [it] is centrally about.
LAKOFF AND NÚÑEZ, 2000
The diversity of data value types and data storage types combinesto help createour data, but they are often not enough. Modern data packages also contain metadata, such as summary values and data dictionaries. These metadata provide explanatory context for what the data values contain, how they relate to one another, and context for what it all means. Many datasets are complex and multilayered combinations. They may contain different structures and file formats. Nonetheless, simple mental models, like the relationship tree, table array, and spatial map, persist.
It's not the numbers that are interesting. It's what they tell us about the lives behind the numbers.
HANS ROSLING, 1995
How do we picture data? We might imagine imperceptible strings of zeros and ones that go on forever, written by tiny machines to solid-state drives. Data lives far away on chilly server racks, ready to serve you at a moment's notice; it is backed up elsewhere, just in case. Data can also be a precious portable thumb drive, pursued by the characters in a Hollywood action film. When we see data in the currency of its medium of storage, we block the creative work we need to do. These impenetrable images of data do not help.
A MacGuffin is something desired that helps advance a story's plot. The pursuit of the MacGuffin, not the MacGuffin itself, is what is important. The search for the Holy Grail motivated Arthurian legends, while the pursuit of the Maltese Falcon statue was at the heart of Dashiell Hammett's detective novel. Today, the search for a valuable cache of data, often made visible by its portable object of storage, propels action films forward. The most lovable MacGuffin might be Star Wars' R2-D2, the custodian of the stolen plans (i.e., data) that can save the people and restore freedom to the galaxy.
The first lesson for data storytellers from James Brown's album is an easy one: The magic is in hearing the music, not the nuance of its capture and storage. The second lesson is that Live at the Apollo is not a perfect time capsule. It cannot be. As a sensory event, the album only transports us audibly to that room, and even then, only partially. It is not a total rote recording of that 1962 concert. It is merely a simplification, an encoding that reduced the sensory reality of that evening to a tiny fraction of its original, rich salience.
But even a virtual reality experience that put us perfectly back at The Apollo would still not be the same as actually witnessing the show. This is because you would have a different frame of reference compared to any 1962 Harlem concert-goer. Furthermore, James Brown's recorded performance will not be motivated anew by audience cheers. Reality does not happen twice. Any recording is but a shadow of the performance. It is an incomplete artifact that lives on.
The album is a sliver of what that night was, but that does not make it inferior. It is a treasure. The album is a beautiful compression of what that concert was. It helped James Brown rocket to success and still moves our feet today. No one would want to watch a continuous stream of someone's life, there is too much monotonous noise. But compress a life story into a two-hour film and you can move the emotions (and wallets) of millions.
Storytellers of all stripes must regularly compress all of the possible information their stories could contain into a manageable number of relatable details.
MICHAEL AUSTIN, 2010
We often wish we could remember more. Russian neuropsychologist Aleksandr Luria treated a patient whose memory was too sharp for his own good. Referred to as S., he suffered from not having the “art of forgetting”—the automatic disposal of trivial detail as we push information from short-term memory to long-term storage. In When We Are No More, historian Abby Smith Rumsey relates the consequences of S.'s condition of being unable to forget triviality:
S. suffered from a disorder of distraction. He could not make things dull, and had a hard time maintaining focus onanything for extended periods. He was unable to sort his impressions for value and emotional salience. To him the world was far too vivid far too much of the time. …
He easily confused what he had remembered (because everything he encountered in his daily life triggered a chain of recollections) with what had actually transpired. Memories were so fresh in affect and spun out in his mind so rapidly that he mistook his recollections for reality. There were periods in his youth when he did not get up in the morning to go to school because even thinking about arising stimulated memories of having done so before. He thought that he had gone to school even as he lay still under the covers.
Having only a compression—an impression, a model, a shadow—is actually the best we could hope for. Too many stimuli would bore, overwhelm, or make it impossible to understand. Distilling the performance of James Brown into an album made it possible for the performance to reach millions of people. And it makes it possible for us to keep traveling back to the 1962 Apollo.
To see why encoding is necessary, imagine trying to memorize an event without any simplification taking place; the result might be called a “total rote recording” or “perception without concepts” … In the real world we can't possibly take everything into account all the way down to its most microscopic details, and so we necessarily must ignore almost everything about every situation that we encounter, and that means we unconsciously make a highly selective encoding of it when we store it in memory. We have to strip everything we experience down to a caricature of itself.
HOFSTADTER AND SANDER, 2013
Go and get more information.
BOOK OF SAMUEL, 1:23
All data is a shadow of what has flowed before. Data is reality distilled with intention. We no longer have to picture data as an impenetrable monolith. When we think about data, we should consider the world that delivered it to us. Pause to reflect: What has been lost from the data's world? Why were some things selected to survive? How has it all been transmitted forward to us, today? Then, we can see data for what it is, whispers from a past world waiting for its music to be heard again.