Nate Silver Gets Real About Big Data

While it has become de rigueur to ascribe all sorts of supernatural powers to Big Data, one of the world's most celebrated statisticians, Nate Silver, is far more circumspect about it. If anything, according to Silver in his book The Signal and the Noise, Big Data carries the potential to cloud our decisions by introducing far more noise than it does signal. It's an interesting position for someone who makes a living predicting the future, and one that directly counters other expert opinion.

Take, for example, the new book from data experts Viktor Mayer-Schonberger (University of Oxford) and Kenneth Cukier (The Economist), Big Data: A Revolution That Will Transform How We Live, Work and Think. Mayer-Schonberger and Cukier urge us to trust data, not worrying about trying to understand correlations but simply to accept it. As Cukier tells Wired, "Big Data enables us not to test [a] hypothesis, but to let the data speak and tell us what hypothesis is best. And in that way it completely reshapes what we call the scientific method we understand and make sense of the world."

One big problem with this view is that it assumes we have any clue how to query the data to even come up with a "what," much less a "why." It's not as if data simply presents itself to us, and we read it objectively.

Quoting Silver at length:

"[Big Data] is sometimes seen as a cure-all, as computers were in the 1970s. Chris Anderson…wrote in 2008 that the sheer volume of data would obviate the need for theory, and even the scientific method….

"[T]hese views are badly mistaken. The numbers have no way of speaking for themselves. We speak for them. We imbue them with meaning….[W]e may construe them in self-serving ways that are detached from their objective reality.

"Data-driven predictions can succeed--and they can fail. It is when we deny our role in the process that the odds of failure rise. Before we demand more of our data, we need to demand more of ourselves….Unless we work actively to become aware of the biases we introduce, the returns to additional information may be minimal--or diminishing."

So, for example, more data has not resulted in less political divide, as Silver points out. It has only hardened positions on either side of the aisle. The same holds true for global warming science. The more data we have, the less we seem to agree.

Why? Because data is never neutral. Or, rather, our perception of it is not neutral.

This is as true for individual enterprises grappling with product or personnel decisions as it is for countries debating policy issues. Big Data can contribute to the solving these issues...even as it contributes to making them more difficult. Again quoting Silver:

If the quantity of information is increasing by 2.5 quintillion bytes per day, the amount of useful information almost certainly isn't. Most of it is just noise, and the noise is increasing faster than the signal. There are so many hypotheses to test, so many data sets to mine--but a relatively constant amount of objective truth.

This jibes with Gartner's Svetlana Sicular, who suggests that "Formulating a right question is always hard, but with big data, it is an order of magnitude harder," due in part to the difficulty of figuring out meaningful correlations in our data. 

Again, while it may seem convenient to wish for the "data to speak for itself," it simply doesn't. It can't. It is always mediated by imperfect individuals with all of our biases, strengths and self-interest.

Which is not to say that data can't help us with our answers. Silver certainly turns to data to help him forecast elections, baseball games and Oscar winners. The trick, as he argues, is to take a Bayesian approach to data analytics, getting comfortable with probabilities, working hard to recognize and account for our biases, and not trying to predict certainties. When we predict certainties, we are almost always wrong.

In short, Big Data has tended to come with its share of Big Hype. So long as we're realistic about its potential, and recognize that our data is only as useful as the human intelligence we bring to it, minus the human biases with which we burden it, Big Data should, indeed, pay significant dividends.