The U.S. Library of Congress announced this morning via its official Twitter account that it will be acquiring the entire archive of Twitter messages back through March 2006. In addition to a massive printed collection, the Library already has an extensive collection of other digital assets. The Library of Congress is the biggest library in the world.
The Library does extensive work with data format standards, the semantic Web and other platforms for outside analysis. The addition of Twitter into the organization’s offerings could foster an enormous amount of academic research. From a new kind of historical record to an unprecedented opportunity for discovering patterns of social interaction, this is big.
When the Library of Congress was founded in the year 1800, publishing was very expensive and relatively few people did it. Today, thanks to blogs, YouTube, Facebook and certainly Twitter it’s a new world. Publishing is far faster, easier and more accessible today than at any point in human history. That might seem obvious, but on a day like today it’s worth thinking about some more.
For now there are more questions than answers with regards to this Library of Congress Twitter news. Will the archive include friend/follower connection data? Will it be usable for commercial purposes? Will there be a Web interface for searching it, and will that change the face of Twitter search for good? Is there any way that the much larger archive of Facebook data could be submitted to the same body for analysis of the same kind?
These kinds of large data sets are poised to become one of the most important resources the Internet creates. As Kenneth Cukier wrote in The Economist’s recent Special Report on Big Data, “Data are becoming the new raw material of business: an economic input almost on a par with capital and labour.”
The Library’s blogger Matt Raymond put it like this in the blog post about the announcement:
Expect to see an emphasis on the scholarly and research implications of the acquisition. I’m no Ph.D., but it boggles my mind to think what we might be able to learn about ourselves and the world around us from this wealth of data. And I’m certain we’ll learn things that none of us now can even possibly conceive.
Nate Anderson at ArsTechnica offers this context:
There’s been a turn toward historicism in academic circles over the last few decades, a turn that emphasizes not just official histories and novels but the diaries of women who never wrote for publication, or the oral histories of soldiers from the Civil War, or the letters written by a sawmill owner. The idea is to better understand the context of a time and place, to understand the way that all kinds of people thought and lived, and to get away from an older scholarship that privileged the productions of (usually) elite males.
Twitter co-founder Biz Stone said today that there are 105 million registered users on the service. How will those users feel about their tweets being archived for posterity? Will non-U.S. users be included (it is a U.S. based company) and object? Lots of questions remain.
There’s no word from Twitter itself about this news but we expect details to become public during the Chirp developers conference starting in just a few minutes. Update: Twitter HQ just told us that a blog post about this news is forthcoming.
It’s hard to imagine a more significant milepost in social media’s early march toward becoming an essential component of our social experience.