The US National Archives and Records Administration (NARA) has apparently decided to end its policy of taking a “digital snapshot” of all public congressional and federal web sites after each congressional and presidential term. According to NARA, which is understandably drawing heat for the policy change, they shouldn’t need to archive those web sites because federal agencies and congress should be doing their own archiving. I read about NARA after reading a very timely piece from Leland Rucker about the nature of information archiving in a totally digital world, and it got me wondering: what happens to all this content on the web 250 years in the future?
Last year Google’s archives touched 100 exabytes of data from the web. To put that in perspective, that’s about 107 billion gigabytes (or, over a half a million 200 GB hard drives). The entire catalog of the Library of Congress is about 136 terabytes — which makes Google’s archive the data equivalent of 771,000 Libraries of Congress.
So clearly, there is a lot of data out there to be stored. And the vast majority of that data isn’t printed — it is being stored digitally and created on computers via email, forums, social networks, blog posts, video sharing, bookmarking, chat, etc. A lot of that data isn’t necessarily something we need to save (who needs an archive of every email I send to my mom, for example?), but what of the data that we do want to keep for the future? The posts on this blog, or thoughtful debates taking place on forums, or breaking news videos published on YouTube, for example.
The Internet is very transient in nature, things often move at a breakneck pace. The main page of a blog like ReadWriteWeb might change 10-15 times in a day. The main page of CNN.com might change far more than that. How do we archive information when the technology to read it, and indeed the information itself, changes so fast?
About 200 years ago, Thomas Jefferson sold his personal library of 6,000 books to the Library of Congress. About 150 years ago, more than half were destroyed in a fire. But today, all 6,000 of them have been recovered or recreated and will go on display at the LoC. Now we’re living in the so-called information age, where almost a gigabyte of new data is being created each year for every man, woman, and child on earth. But what’s going to happen it to it all 250 years from now? “Is digital content too ephemeral to last?” wondered Leland Rucker. Will digital information have the same lifespan as printed books?
We’d love to hear your thoughts on the matter, so please let us know in the comments what you think the future holds for the massive flood of information we’re creating today.