By making available databases of human genomic data, US census records and other data of public interest, the Amazon Public Data Sets are an incredible resource. They’re like a 21st century Public Library for robots to patronize. In this emerging era of flourishing data-centric applications, though, the state of the art never stands still.
Forty year old British technology platform Talis (background) announced this week that it now offers free, perpetual storage and keyless API access to semantically marked-up large data sets. The offering is called the Talis Connected Commons and it’s the kind of thing that anyone with a geekish imagination can get excited about.
The Setting
If the current web economy is being rocked by easy publishing systems that make the people formerly known as “consumers” capable of publishing and socializing around content of their own creation – then the next step of internet evolution may come in the form of automated systems able to process meaning and patterns out of large amounts of user-created and other information. When structured, free and available programmatically in bulk – that data is like a big pot of gold for developers.
While Amazon offers free access to data sets, transport of the data is still paid for by users. The Talis Connected Commons also offers an API by default (a SPARQl end point, in particular) and is focused specifically on semantic data. The system is made for public sharing – two variations of Creative Commons licenses are supported for the data stored there. Talis is requesting that data set owners email a short description of their content to the company for approval and inclusion on the site.
In other words, there’s no gold in the pot yet. Talis is more than well established and this offering is aimed at such a sweet spot that the only way the Connected Commons won’t be filled with good data is if the company totally drops the ball. We don’t expect that to happen.
The Plot
This project is in the same vein as Nova Spivak’s forthcoming ontology authoring and hosting service, the vision of open source microblogging as the future of business intelligence and more.
There’s a chain of events that news like this helps fill out. First, massive bodies of data are created or gathered, books are scanned, census data is collected, and patients donate their anonymous aggregate medical data to science. Next, the data is semantically analyzed and marked up (through any number of different semantic processing engines). Then, the data is stored and an API is made available (this is where the Talis Connected Commons comes in). Finally, developers build applications that leverage the smart data offered up through the platform, data visualizers find new stories to tell in images built from the marked up data and new relationships between people, organizations and concepts have the mist cleared away from them through systematic analysis of various permutations of previously unavailable structured data.
Amazon Public Datasets include things like human genomic data, US census data, and data parsed from Wikipedia. What will the Talis Connected Commons provide a home and API for? We look forward to finding out.