Sunlight Foundation has come out swinging.The US Office of Management and Budget issued new reporting guidelines this week for recipients of the $787 billion American Recovery and Reinvestment Act of 2009 and the normally polite geek watchdog organization the
"...[A]bsent from the new instruction is a requirement to make raw data public," Sunlight's co-founder and Executive Director, Ellen Miller, wrote this morning. "By not including raw data at Recovery.gov, transparency is dramatically reduced. Sunlight has argued strongly for raw data in machine readable formats as the starting point for Recovery.gov. This is a significant failure by the Administration to live up to its promise for full and complete disclosure. Significant failure."
The Recovery.gov site might surprise us and end up offering the data it collects in raw bulk formats, but without making preparation for that a requirement in reporting from recipients it seems unlikely to be done well, if at all.
Why would the Obama Administration not offer raw bulk data as part of its much celebrated transparency? One arguement against raw data came out of the woodwork during the successful push to get the US Senate to offer mashup-friendly XML (extensible markup language) feeds for Senate voting history. "The secretary of the Senate has cited a general standing policy," John Wonderlich, policy director at Sunlight, told Politico's Victoria McGrane, "that they're not supposed to present votes in a comparative format, that senators have the right to present their votes however they want to."
The Recovery.gov website is beautifully designed, but when the data being collected from federal recovery fund recipients is made available this October it will be hard to call it transparent if presentation of that data is done entirely by the hand of the government program being scrutinized. Raw data, freely available to the public, would allow for open-ended analysis by the community at large.
Sunlight's critique of the lack of raw data forthcoming from Recovery.gov follows questions about the effectiveness of the Administration's new Data.gov site, a would-be repository for government data that anyone can extract and analyze. We called that site disappointing when it launched in May and subsequent updates to the data offerings there have been uninspiring.
Meanwhile, the UK government has taken the question of raw data so seriously that it has employed Sir Tim Berners-Lee, the man who invented the World Wide Web and one of the world's most prominent advocates for releasing raw data to the public.
While public discussion of these kinds of moves often focuses on "making information available online" - that's old news, folks. It's an increasingly data-centric world and we need that information as open as possible for a growing corps of citizen and non-governmental analysts, computer assisted reporters and others to work their magic on. The difference between the government reporting its own data on its own websites on one hand, or opening up access to the bulk data for other people to analyze on the other hand, is like the difference between watching a puppet show and being able to shine a light behind the stage to check yourself for injustices, improprieties and other insights we can't foresee before getting a chance to look. So far the October reporting on Recovery.gov appears set to be a puppet show.