The Data-Driven Future Of Journalism

Everyone seems to assume that Jeff Bezos, the founder of Amazon.com, spent $250 million on the Washington Post as some kind of hobby or charitable move. And some question whether he even has a plan on how he’ll run the newspaper.

Tech investor Keith Rabois, whom I recently hosted at our ReadWriteMix event in August, has me rethinking those assumptions. When I pinned Rabois down on where value was going to be created in the future, he answered, “Data science.”

In other words, not just the accumulation of data enabled by the ever-growing amounts of computing power, bandwidth, and storage we have available to us, but the smart application of it to reshape products, businesses, and industries in a continuous cycle of evolution and improvement.

In the tech world—the world where Bezos made his fortune—it’s taken for granted that one should use data about how people use a product to make that product better and introduce new features.

What if we actually did that in the media world—without sneering, without gritting our teeth, and without oversimplifying the enormity of the task?

“Data Journalism” Needs A Redefinition

The term “data journalism” has come to have a narrow definition of “reporting using publicly available data sets.” That seems woefully insufficient on two counts.

First of all, any good reporter ought to use all sources of accurate information and all available tools to vet and analyze that information. That includes databases and tools to manage and extract insights from them. We need not call this “data journalism”: It’s just journalism.

Second, reporting is necessary but not sufficient to commit acts of journalism. A reporter needs coconspirators: editors, photographers, and designers. Since we’re online, let’s add to that list product managers, engineers, and data scientists. Data ought to inform the entire operation that creates the product, not just the newsgathering.

In particular, the notion of data journalism seems to have devolved into a very narrow concept of hyperlocal reporting driven by municipal databases: a laudable effort, but such a small and often inconsequential application of a powerful idea.

Don’t get me wrong: I’m delighted that someone is putting restaurant health-inspection reports on an interactive map. It’s just that we as journalists have so much more to do with our own data.

A Conversation With Readers, In Bits And Bytes

What data is that? Why, it’s chiefly the interactions our readers have with us—reading, commenting on, and sharing our stories. Every story we publish creates a massive trail of data exhaust. But we let much of it dissipate like the San Francisco fog on a sunny afternoon.

Here’s an example: The other day, I checked Chartbeat, one of several analytics tools, and noticed a spike of traffic to an old story we’d run about downgrading from Apple’s beta version of iOS 7, its mobile operating system, back to iOS 6. Since everyone expects iOS 7 to be out later this month, I couldn’t see a reason why people were suddenly flocking to the story—unless something had unexpectedly gone wrong with the beta test.

ReadWrite reporters Selena Larson and Adriana Lee went to work, finding ordinary users as well as developers who’d been affected and locked out of their phones. We rapidly debunked conspiracy theories that Apple was locking out nondevelopers from the beta. (Technically, only registered developers should have had access to iOS 7, but where there’s a will, there’s a digital way.)

Instead, we determined that the combination of an expiring older version of beta, some unexplained failure of automatic updates, and an out-of-service activation server forced users to downgrade.

Mobile editor Dan Rowinski followed up with analysis of how this beta release, the first since the ouster of Apple mobile-software executive Scott Forstall last year, seemed particularly troubled.

Listening To What Your Readers Say—And Do

Listening to your readers is as old as publishing letters to the editor. What’s new is that Web analytics create an implicit conversation that is as interesting as the explicit one we’ve long been able to have.

Why did people read a story? How did they find it? What did they think of it?These are fundamentally human questions that have dogged storytellers since the dawn of literacy. There is nothing new about them.

We simply have better tools to get answers these days, if only we’d use them.

Analytics have a bad rap in the publishing business because of their use—misuse, rather—at entities like Demand Media and the Huffington Post. Remember the infamous headlines posing the question, “What time does the Super Bowl start?”

Even The Onion’s satirists have blamed analytics-chasing for CNN featuring Miley Cyrus’s twerking performance at MTV’s Video Music Awards at its top story—a parody so cutting that CNN Digital’s real managing editor felt obliged to deny writing the piece.

Data Power Should Be In Editors’ Hands

I’d argue that tail-chasing search-engine optimization and short-term pageview-chasing are the result of leaving data in the wrong hands—engineers more interested in algorithms than humans, Internet opportunists chasing dollars, and overworked editors too far down the masthead tasked with delivering quantifiable results.

What we need as an industry—at the very least, what I’m trying to create at ReadWrite—is not a slavish adherence to data, but an interest in it as a proxy for the real human beings who make up our audience. Buried in the metrics is a telegraphic signal from those people—an attempt to communicate, to reach out and connect and guide us toward better stories and better ways to tell them.

We need better metrics, to be sure. A pageview is the crudest possible approximation of the interaction between a writer and a reader. How much time do they spend reading? At what pace do they scroll down the page? Do they read continuously or jumping back and forth between paragraphs? Do they follow related links and come back to the page—or wander off?

How do they find stories? Search terms are worth looking at, but they form the beginnings of questions, not answers. What do they reveal about our readers’ interests and passions? How do they frame questions about a topic? What do they already know, and what background do they need to enter into an ongoing story?

Suggestions, Not Commands

When I speak in public, I watch the audience for feedback. Are they leaning forward or back? How rapt are their gazes? That guides me to do a better job.

It’s no different online—or at least it shouldn’t be. Analytics are tools for listening. Data about our audiences is feedback—it doesn’t provide marching orders.

But should it sway us when we’re thinking about areas to explore? Can it suggest what a good follow-up might be? Does it nudge us to go deeper on a topic than we’d initially planned to? I think those are all valid ways data can influence journalism.

Ultimately, you need to have an idea of what your publication stands for and who you are as a journalist. Minus those lodestones, data can provide no guidance. But if you know who you want to reach and what you hope to do for them, there’s no question in my mind that data can help you fill in the map as you travel to your destination.