Journalism has always been about reporting facts and assertions and making sense of world affairs. No news there. But as we move further into the 21st century, we will have to increasingly rely on "data" to feed our stories, to the point that "data-driven reporting" becomes second nature to journalists.
The shift from facts to data is subtle and makes perfect sense. You could that say data are facts, with the difference that they can be computed, analyzed, and made use of in a more abstract way, especially by a computer.
With this mindset, finding mainstream data-driven stories doesn't take long at all. A quick scan of the Guardian's home page tells us that swine flu cases are up by 50%, according to "fresh figures...[that] will be released this afternoon." The story here is that we're in danger because swine flu is on the rise. Reporting the current figures available for swine flu alone wouldn't be all that interesting. The news comes from comparing the current figures to last week's, which is a very simple form of data analysis. By making use of published data and running one's own analysis (and building on the analysis of others), we get something very news-worthy indeed. It moves the definition ever so slightly, from "saying and asserting" to "analyzing and publishing." But it obviously works only for data that is accessible.
There is nothing new about pointing out the importance of public data being made available. Sir Tim Berners-Lee has discussed at length the importance of governments and institutions putting their data online, making it accessible and useful. His TED talk and interviews with ReadWriteWeb and Talis (disclosure: I am a blogger at Talis) all explain his belief that by publishing linked data we can begin to solve many of the problems the world faces. Innovations in medicine, science, and development could all be achieved if only currently hidden data were made available. Data-driven journalism could be the first step in realizing this dream. The best stories would then come from innovators who read about trends reported in news media and are then able to draw new conclusions and solve bigger problems. In his recent discussion with BBC, Berners-Lee said that the next step is to go for low-hanging fruit by just getting the data out there.
Thus far, this has made a lot of sense to me, and I have been tracking the publication of linked data and increasing access to public knowledge as emerging trends over at Talis. But my perspective has shifted a bit in the past few weeks.
First, there was data.gov and President Obama's call for more access to government data. A sitting head of state (and one of some significance) was clearly calling for public access to government data: this was news! But the idea has been discussed, praised, and debated for a while since then and may have lost some of its luster.
Then about a month ago, UK Prime Minister Gordon Brown made it part of his digital strategy to prioritize the publication of government information. He asked Sir Tim personally "to help us drive the opening up of access to Government data in the web over the coming months" and appointed Berners-Lee an official governmental adviser. By now, neither of these stories is news and comparisons between the initiatives have been made.
The Guardian newspaper recently launched its own Data Blog, with the intention of letting readers access, mash up, and reuse much of its information in the form of data, which could in turn drive stories.
What is perhaps not as explicitly recognized is the voracious appetite for data that has been apparent for months. It is less about turning good ideas into stories and more about seeing how data informs our understanding of events happening right now. Each new initiative is another piece of low-hanging fruit picked.
Access to data is important: it drives innovation and even social change. Governments that publish their data have to become more transparent. Humanitarian organizations that make their findings known could spark bigger projects and source innovative solutions from their communities. Scientific findings and raw information could be used to solve bigger problems than the result of a single experiment or trial could ever manage. Even the simple comparison of two or more facts can lead to new insight, and all of these things happen only when the walls around an institution become porous.
2009 could become known as the year of data, the year of open access, or the year of the semantic Web (see links above for how this relates), and it may also be the first year when it becomes news that data wasn't published in a story when it should have been. That a government body isn't being transparent or is blocking access by publishing its findings in PDF or other non-linking formats would make a very interesting story indeed. We can expect to see more and more organizations and public bodies remove their own barriers through initiatives and legislation. Examples have been set, and seeing excuses die along with barriers is not far-fetched.
Do you know of other data-driven stories? We'd love to hear about any insights that were made through publicly accessible data or where this data might come from next.