We’ve written a lot this year about the boom in e-readers and the benefits that e-books have over print. And often, discussions surrounding the move to digital texts involves our enhanced ability to read and store our libraries, particularly via mobile devices.
But a new project available in Google Labs today – Books Ngram Viewer – highlights some of the other benefits of digitizing texts beyond better reading and storage. So let me invoke my former life as a literature PhD student here to say, “This is incredibly farking cool.”
Visualizing the History of the Usage of 500 Billion Words
Using Google’s Books Ngram Viewer, you can now visualize how language and literature have changed over time, by searching a subset of the more than 15 million books that Google has digitized since 2004. All told, today’s datasets contain more than 500 billion words from 5.2 million books in Chinese, English, French, German, Russian, and Spanish.
The datasets contain phrases of up to five words with counts of how often these occurred each year, providing a great deal of insight – for scholars and casual word hounds – into how language usage changes over time. The datasets were the basis of a research project led by Harvard University’s Jean-Baptiste Michel and Erez Lieberman Aiden and published today in Science that demonstrates how quantitative analysis of texts can offer new insights into areas including censorship, technology adoption, and cultural memory.
And now Google has put that visualization tool into everyone’s hands, along with the ability to download the raw data.
Language, Literature, Culture Over Time
Take the word “farking” that I used above. Usage of the word has risen and fallen over the years, skyrocketing not surprisingly once it became the curse-du-jour in the reprised Battlestar Galactica series. If that’s too pedestrian, compare the changing usage for spaceship, spacecraft, rocket, and UFO. Or the frequency of communism, anarchism, socialism, and capitalism over the course of the twentieth century. Or the decline of man.
New Quantitative Tools for Scholars
By mining this data, scholars are able to shed new light onto many things we’ve long assumed about literature, language and culture. According to Dan Cohen, the Director of the Center for History and New Media at George Mason University (whose work on the datasets, specifically the Victorian era, was featured in a recent article in The New York Times), the release of the Ngram Viewer is a “real win” as it provides “an easy-to-use research site and, even better, the raw data behind it.”
This is an incredible amount of data, a boon to researchers in both the humanities and social sciences. as well as a pretty fun tool for the more casual lit-geeks and word-lovers among us.