More than two years after President Obama's memorandum on his open government initiative, thousands of public authorities and organizations worldwide have embraced the main idea behind it. Opening up data and making them publicly available on the Web has been recognized as a key to fostering transparency and collaboration within public administrations and with citizens.

From census data, to cadastrial maps, everyday a new data set pop ups on the Web, as a quick glance at the #opendata hashtag on Twitter shows.

Discovering and consuming open data

Davide Palmisano is a Semantic Web software engineer based in the emerging Silicon Roundabout of London. An open data enthusiast, Davide's highest ambition is to speed-up the rise of a new data economy. He is founder and CEO of Smartetics.
Since the open data movement has shown no declining signals, several hubs, or data markets, have been released. This was a direct consequence of the need for ways to search all different data sets. According a wide definition, data markets are platforms where the users may search, browse and discover new data sets to fulfill their needs. The added value they bring varies according the functionalities they offer, making them something more than a simple vertical search engine.

For example, the Icelandic startup Datamarket.com provides a fully flavored set of functionalities to visualize the data. Data time series could be visualized with several different types of charts, allowing the users to add dates, grabbed from the Guardian archive. The result is a handy way to make explicit the correlations between trends and historical events. Then, end users could access the data through REST APIs or export them in CSV or XML. Links to diagrams could even be shared on Twitter or Facebook, making Datamarket a fancy and pragmatic tool.

Factual, another platform recently raised $25 million dollars in a Series A funding, mainly impresses for its ability to join data sets. Different data sets are represented with different tables, slightly similar to a relational database where end-users could make projections, selections and joins on table fields. Then, some applications could be build on top of the aggregated data and the result embedded in a third-party website.

CKAN Data Hub is a remarkable initiative led by the Open Knowledge Foundation. It is probably the largest hub in terms of indexed data sets. Released as an open source project, it offers API access to search and browse the index, but it's not equipped with an explicit mechanism to directly manipulate the indexed data. However, what strongly differentiates CKAN from the others, is the emphasis it puts on data licensing: every data set can be published using any of a number of open licenses. Most of them are directly endorsed by the Open Knowledge Foundation, a group that is playing a leading role in this field.

The last goodie is Talis's Kasabi, which was demoed at the last Semantic Technology Conference in San Francisco. Kasabi offers interesting innovations powered by a pragmatical use of Semantic Web technologies. For any given data set, users can design their own REST APIs and re-publish them on the market, hinting at a forthcoming revenue model. What makes Kasabi one step ahead of others is the powerful mechanism provided by SPARQL to slice, select and remix the data. Once a query has been defined, the user could completely customize the response using an XSLT transformation. Under a certain perspective, Kasabi could be seen as an engineered showcase for the potentials of the entire Semantic Web technologies stack.

Heterogeneity raises development costs and acts as a barrier to the development of an enterprise reuse of the data. Costs raised by all the implementation tasks needed to access the data, make them coherent with a specific application domain, curate them and, finally, generate business value from their usage.

Following the 5 stars

Even if those platforms, and other well-known products such as Microsoft Azure or Infochimps, are concretely sustaining the tendency to open up the data, there are still obstacles to a harmonious and integrated data consumption. Data publication techniques, for example, vary from simple database exports with CSV files to sophisticated Semantic Web-powered platforms, such as data.gov.uk. This heterogeneity raises development costs and acts as a barrier to the development of an enterprise reuse of the data. Costs raised by all the implementation tasks needed to access the data, make them coherent with a specific application domain, curate them and, finally, generate business value from their usage.

Even if some markets are facing the need to standardize their internal representations, there's still a lack of Web-wide integration among different data sets. It's nearly impossible to link and access to different data sets published on different markets.

In addition, even when data are directly published on the Web, they do not really benefit from the web model. This lack has been recently pointed out by Tim Berners-Lee who proposed the 5 stars of Linked Open Data: a handy way to judge the data quality with regards to the license and the syndication technology used to expose them. The Linked Data paradigm is seen as a key to tackle the main issues related to data integration. The "webby way to link data" foresees unique URI-referenced entities linked together, machine-readable representations and open licenses as the main foundational ingredients to achieve web-scale open data consumption.

Tim Berners-Lee (has) proposed the 5 stars of Linked Open Data: a handy way to judge the data quality with regards the license and the syndication technology used to expose them. The Linked Data paradigm is seen as a key to tackle the main issues related to data integration.

Datamarkets: quarterbacks in the emerging data economy

We can reasonably expect data markets, or whatever we'd like to call them, will play a prominent role in the emerging data economy. Once the revenue models for data publishers are clearly defined and accepted, and once a critical mass of 5-star quality interlinked data sets become available, a new wealth of opportunities for developers will emerge. In some sense, a virtuous revenue model should encourage big owners to open their data sets, consumers to offer their APIs with flexible pay-by-use fees and the various markets to compete on the value-added services they will be able to provide. The mission is all about building an ecosystem, rather than merely develop vertical search engines.