Sticking with her original deadline announced last year, European Commission Vice President Neelie Kroes told a European interoperability standards forum yesterday that a public portal for access to government and public data from across the continent is on track to go online in Spring 2012. Following that, the next stage in Comm. Kroes’ agenda includes an ambitious project to launch a community-built, crowd-sourced public data platform for all of Europe.

Kroes told the OpenData Forum in Brussels she expects for a pan-European forum for public data mining to go live no later than 2013. “Will she really be able to pull off all that?” the commissioner asked rhetorically, referring to herself.

A Commission report last November (PDF available here) produced by a technical workshop convened in Luxembourg of data standards experts, including W3C, painted a very broad picture of the types of data the government is looking to federate. A new data portal would need to include a small group of very interesting datasets first, the report stated, to attract citizens’ interest early. But then these early datasets would need to be stitched together with bigger datasets, the ownership of which may be indeterminate. Goal #2, the report said, would be to “deeply integrate a small set of very high quality datasets demonstrating immediate value and, in time, capable of acting as a scaffold for the integration of many other datasets. Candidates in this second role are geospatial, transportation, statistical and financial datasets.

Yesterday, Comm. Kroes narrowed and focused the definition of these datasets somewhat: “Making good use of public data can make your life better. Whether it’s route planning using public geo-information or public transport data, a local community crowd-sourcing its maintenance priorities, decision-making built on statistics of all shapes and sizes, or data journalism that helps explain our world,” she told attendees.

“Research in genomics, pharmacology or the fight against cancer increasingly depends on the availability and sophisticated analysis of large data sets,” the commissioner continued. “Sharing such data means researchers can collaborate, compare, and creatively explore whole new realms. We cannot afford for access to scientific knowledge to become a luxury, and the results of publicly funded research in particular should be spread as widely as possible.”

But sharing such data may not be the intention of some member countries that may have had intentions to license that data commercially. For them, the commissioner implied, she would not be against imposing new rules compelling member countries to license their data at low or no rates: “In particular, we’ll be looking at the way data is disclosed – the formats and the way data licenses operate to make re-use straightforward in practice. We’ll also be looking at charging regimes because expensive data isn’t ‘open data.’ In short, getting out the data under reasonable conditions should be a routine part of the business of public administrations.”

Kroes envisions private citizens being able to develop their own applications around this publicly available data. When the Commission submitted a tender last April for freelance and open source programmers to build the data portal, initial reaction from developers – as told to CrowdSourcing.org – included statements implying that if governments were as open as Kroes envisioned, projects such as WikiLeaks would be rendered unnecessary.

“I think one of the biggest excitements is a really radical move to open up all spending and to be fair,” said Cambridge, U.K.-based Open Knowledge Foundation founder Rufus Pollock. “The U.K. Government isn’t too corrupt, but imagine places where corruption is a problem. Imagine that all over the world!”

The OKF is one of the leading organizations behind the Data Hub, an open effort to build a data portal around public data sets. The Data Hub’s initial store consists of 83 datasets provided by the W3C Linking Open Data Interest Group, including the Allen Mouse Brain Atlas – a record of gene expressions recorded from genome images of mouse brains; the CIA World Factbook; and the Freebase RDF Store which includes topics gleaned from Wikipedia. The challenge for developers now may be to demonstrate how mashups may be created to link all this data together in some meaningful fashion.

Meanwhile, the challenge for the E.C. will be to find an inexpensive way to open up more meaningful data.