Amazon.com changed the retail world. In the process the company built up so much surplus computing power that it started a dirt cheap “computing in the cloud” business that changed the computing world. This week the company’s newest project Public Data Sets on Amazon Web Services began offering more than 1 Terabyte (1000 GB) of fascinating public data for developers to access on the fly through Amazon’s cloud computing service.
We’re talking about an annotated collection of all publicly available DNA sequences, including the Human Genome, huge amounts of chemistry data, machine readable encyclopedic entries about millions of different topics and an entire dump of Wikipedia. US Census data, data from the US Department of Transportation and more. It’s all accessible by web applications in no time at all. What do you think this is going to change?
The company made a blog post last night announcing the availability of four new public data sets.
This includes data from:
- The Bureau of Transportation Statistics.
- DBPedia Knowledge Base – which “currently describes more than 2.6 million things including 213,000 people, 328,000 places, 57,000 music albums, 36,000 films, and 20,000 companies.” All in handy semantic markup.
- The Freebase Data Dump – the giant collaboratively build semantic database on a wide variety of topics, data that high profile startup Metaweb has spent millions of dollars assembling.
- The entire English section of Wikipedia, dumped into a machine readable format.
- A number of large genetic and scientific databases.
We counted all the databases up and it passed 1 TB of available data. The company says that accessing this data is “trivial” for developers.
What are developers going to do with this data? We can’t wait to find out. The prospect of mashing up, cross referencing and user interfacing with this amount of data is nearly unfathomable. Really. This data will be leveraged by all kinds of different web applications, for a long time.
You’ve read, or can imagine, the impact that the first Public Libraries had on human culture. Now imagine the opening up of not just this, but other libraries of data, so huge that economies of scale blast the project off beyond any analogy that could be drawn with our everyday experience or historical memories. It won’t just be Amazon that offers up this kind of data – it will be relatively commonplace soon, we imagine.
It will be like a network of libraries – for robots. Robots that go to the library frequently, read very fast and make serious use of what they’ve learned.
Congratulations, Amazon, on passing 1 TB of public data made available. May all our robots of the future please live in peace.