It's hard to pay attention to the business of journalism without hearing about data journalism or data-driven journalism. But despite all the discussion of the topic, there's precious little documentation to guide practicing and future journalists in becoming proficient in it. The Data Journalism Handbook aims to fix that, albeit at a high level.

The Data Journalism Handbook effort started at a workshop at the London MozFest 2011 last November. From there, the handbook represents the work of "an international, collaborative effort involving dozens of data journalism's leading advocates and best practitioners." This includes folks from ProPublica, The Washington Post, the BBC, The New York Times and many others.

The result, so far, is an online book that's just now in beta. Eventually it will also be published in dead tree and e-book form by O'Reilly. However, given the nature of the tome, it's most useful online. As you'd expect from a title that was born at a Mozilla conference, the text is full of links to online resources. I suspect trying to read the title as an e-book - or especially on paper - would be a little frustrating.

Inside the Handbook

The handbook offers a glimpse into the practice of data journalism, with some guidance on how to get started. You'll find a slew of case studies, along with sections on getting data, understanding data and delivering data to the public.

The handbook covers topics like open data, data use rights, scraping and crowd-sourcing data, and community engagement. You'll also find some high-level discussion of tools to work with open data, and how to get that data. 

Most importantly, the book offers a resounding case for data-driven journalism. The case studies demonstrate the utility of data-driven journalism and the service that it offers the public. For instance, the OpenSpending.org example should inspire any journalist that covers politics and public funds. The Price of Water case study shows not only the service to the public, but the service of the public in gathering data.

The handbook is not a comprehensive guide to all of the concepts and skills that a journalist needs to practice data journalism. It doesn't teach the skills necessary for data literacy, though it does provide some links to resources. It also, of course, explains the importance of data literacy. But it certainly doesn't try to teach journalists how to program and make use of APIs, or how to use tools to create data visualizations.

In short, it's not Big Data for Journalists or even Programming 101 for Journalists, and more's the pity. Programming and working with data sets is a skill set that many journalists would do well to have, but most don't. To be fair, the handbook doesn't necessarily advocate that journalists be programmers. It does emphasize being able to work well with programmers, but it would probably be a very good idea to have at least a fair grasp of basic programming. 

Tips and Ideas

If you read just part of the handbook, I'd recommend skipping the case studies and going straight to the meat of the book. Specifically, the sections on getting data, understanding data and delivering data. 

For example, "Become Data Literate in 3 Simple Steps." This piece advises journalists, at a high level, how to approach data. Ask yourself how the data was collected and if it can be tested. Don't assume that data handed to you by a source is going to be valid. (And if the data is not valid, it may be a story, or it may defeat the premise of the story.)  Question the data, how it was gathered and whether it's a reliable sample. You see, for instance, many "trend" stories about technology based on a single data set. You may not have a large enough sample size to rely on.

The section on visualizing data is also useful. The handbook recommends that reporters working with data find a way to visualize it, even if that's just pulling numbers into a spreadsheet. Visualizing data allows you to find patterns that you might otherwise miss.

In the enthusiasm for working with data, scraping websites or gathering data in other ways, there's also the small matter of legal restrictions. Whose data is it, and do you have the right to distribute it? The "Using and Sharing Data" section advises reporters to consider the ownership and licensing of data, and when "database rights" might mean that you can't distribute a data set in its entirety. It also covers various open-data licenses and recommends that news organizations apply those when distributing homegrown data sets.

An Unevenly Distributed Future

What the handbook also does, sadly, is provide a tantalizing picture of what is, and what should be. As William Gibson said, "the future is already here - it's just not very evenly distributed." The same can be said for data journalism. We have marvelous tools for doing data journalism, and they're getting better all the time. In some newsrooms, journalists are producing solid work with in-house or open-source tools, examining everything from public data sets to data curated in-house.

In most newsrooms, however, reporting has not yet been significatnly affected by data journalism. In an era of continual layoffs and cutbacks, there's no budget for training or tools to help reporters get up to speed with the necessary tools and practices. Most of the case studies describe projects that take weeks or months, a depressing concept for journalists tasked with writing several stories per day.

There's a deep need for the handbook, and a sequel or two that dive deep into the actual practice of data-driven journalism. (To my friends at O'Reilly, a "programming for journalists" book would be a nifty title.) It's inspiring and educational material, if less focused on "how-to" than one might like.

Data-driven journalism is in its infancy right now, despite the amount of discussion it's generating. I suspect that it's going to be five to 10 years before we'll see the practices in the handbook becoming mainstream.

Image from the Data Journalism Handbook, which is available under the Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) in its entirety.