One of the mantras of the Big Data revolution is that causation no longer matters. It’s enough, the theory goes, to seek correlations in our copious data, deciphering “what” is happening and not bothering with “why.” But not only is this problematic for a business looking for optimal retail pricing strategies, it’s dramatically more so for those charged with crafting public policy.
For governments and other public institutions, it turns out that understanding causation matters a great deal.
Causation Loses Its Sex Appeal
The “forget-causation-seek-correlation” Big Data crowd has been around for years and its most sophisticated proponents are Kenneth Cukier (The Economist) and Viktor Mayer-Schönberger (Oxford University). In their excellent Big Data, the authors argue: “In a big-data world … we won’t have to be fixated on causality; instead we can discover patterns and correlations in the data that offer us novel and invaluable insights. Big data is about what, not why.”
The idea is that given enough data, algorithms can appreciate correlations between seemingly disparate data sets without bothering to understand those correlations. It is enough to see that a rise in the purchase of Pop-Tarts at Wal-Mart highly correlates with hurricane warnings. Wal-Mart needn’t understand why: it just needs to stock Pop-Tarts in a visible area of the store whenever hurricane warnings are issued.
When Understanding Causation Really Matters
But for governments, it’s not enough to casually skim correlations and ignore the reason people behave as they do. Governments aren’t in the business of satisfying purchasing whims, they’re in the business of protecting citizens and setting the foundations for societal happiness.
Bruce Schneier points out that Big Data can help us to “avoid occasional jolts and disturbances and, perhaps, even stop the bad guys. But it can also blind us to the fact that the problem at hand requires a more radical approach.” That is, ignoring causation can lead us to affix a Band-Aid to problems that actually demand serious surgery. So long as we’re content to skim correlations in our data, we’re more likely to miss the underlying causes.
This can lead to bad policy, as Evgeny Morozov argues,
It’s one thing for policy makers to attack the problem knowing that people who walk tend to be more fit. It’s quite another to investigate why so few people walk. A policy maker satisfied with correlations might tackle obesity by giving everyone a pedometer or a smartphone with an app to help them track physical activity—never mind that there is nowhere to walk, except for the mall and the highway. A policy maker concerned with causality might invest in pavements and public spaces that would make walking possible. Substituting the “why” with the “what” doesn’t just give us the same solutions faster—often it gives us different, potentially inferior solutions.
This might not be a big deal if the issue is more or less revenue for Wal-Mart. But for governments seeking social welfare, the stakes are much higher.
The False Promise Of More Data
Besides, it’s not as if Big Data sits there, just waiting to surface correlations. We decide which data to store and query. In so doing, we apply all sorts of biases to that data. By pretending we’re neutral in the process, seeking value-free correlations in our value-free data, we’re more likely to devise solutions that are biased and wrong.
Big Data can facilitate the search for causation, but it turns out that adding data often creates more confusion than clarity. As Nate Silver highlights, “If the quantity of information is increasing by 2.5 quintillion bytes per day, the amount of useful information almost certainly isn’t. Most of it is just noise, and the noise is increasing faster than the signal.” And just because we see a correlation today doesn’t mean it will persist.
Even worse, “De-emotionalizing inherently charged, social issues could worsen the effect of disproportionate arrests based on demographic probabilities,” as Madison Andrews notes. However much we may want to reduce complex policy issues to simple, data-driven “X-correlates-with-Y-therefore-do-Z” formulas, the reality is always more complex when people are involved.
Ultimately, that’s the problem with correlation-driven Big Data approaches to public policy issues: they try to de-humanize human decisions. That’s always bad policy.
Image courtesy of Shutterstock.