If there’s one thing we know for sure about the Internet, it is that by its very nature it is a transient medium. What was 10 years ago, is generally nothing like what is today, which will be nothing like what will be 10 years from now. Blogger Robert Scoble posted today that the recently released Search 2001 archived search engine from Google (our coverage) highlights well the web’s transient nature. Many of the sites that existed in the search results on Google in 2001, not only don’t appear in the results for those searches today, but don’t exist on the web at all.
Unlike printed information, which requires a physical act to destroy, when you change something on a web page, or stop paying your hosting bill, that information is lost to the ether rather passively, and immediately. In April, the Library of Congress in the US completed a project that restored the original 6,000 books from Thomas Jefferson’s personal library, that made up the LoC’s first collection. Most of the books were originally lost in a fire about 150 years ago.
It’s somewhat alarming that printed books from a century and a half ago have been able to be archived until today, but much of the information created on the web in the past decade is already gone forever. The biggest reason for that is probably the sheer amount of information that we’re creating.
Last year, Google’s cache contained 100 exabytes of data — or almost three quarters of a million times the size of the information contained in the Library of Congress, one of the world’s largest libraries. That’s far more data than existed to be archived 150 years ago, and the Internet, where everyone is a publisher, is causing us to create data perhaps faster than we can hold on to it.
We’re living in an age where data is being created at an overwhelming rate (to the point where many of us are feeling overloaded). Billions of gigabytes of data are pushed out over the web every year — and with the growing popularity of microblogging and the realization of the ubiquitous Internet, the rate of information creation is only going to grow.
Dave Morin from Facebook and Nova Spivack from Radar Networks said last week that the “ephemerality of the web” was a huge problem that we have to figure out how to address. Before we can begin to figure out how to filter and make use of all the information we’re creating on the web, we’re going to figure out how to keep it from disappearing.
150 years from now, will this blog post still exist? Robert Scoble doesn’t think most of what we’re writing today will survive the century. How about you? And the corollary question to all this is, should we even bother archiving most of the stuff on the web? Who gets to decide what should be saved and how? I’d be interested to hear your thoughts in the comments.