recent blog post. According to Borenstein, the theme embraced at FooCamp was "big data will save us."Greg Borenstein takes on what he sees as the dominant view among the elite geeks at FooCamp in a
Borenstein raises some excellent points about how we think about big data and where the whole concept may be going. Just because we have massive amounts of data doesn't mean we know how to use it or that it will ever be helpful.
Straw Man or Reality Check?
Borenstein writes, "Overall, there seemed to be a pervasive worldview that, if stated reductively, might be expressed thusly: Now, with so much of human behavior taking place over the web, mobile devices, and through other information-producing systems, we are collecting so much data that the only rational way of approaching most decision-making is through rigorous data analysis. And through the kind of thorough data analysis made possible by our new massive cloud computing resources we can finally break through the inherent irrationalities and subjectivities built into our individual observations, mental models, worldviews, and ideologies and into a new more objective data-driven representation of the world that can improve and rationalize our decision making."
Borenstein writes, "I'm not trying to create a straw man," but kind of does anyway. I've followed the business use of big data more than the social or scientific uses, but find it difficult to believe the outlook is as extreme as Borenstein paints it to be. Then again, I wasn't at the conference. But Borenstein himself writes that "These are incredibly smart people who live in the midst of the subtle distinctions and limitations that come up in practice when working on these kinds of problems in real life."
That said, I don't think it's unfair to say that big data is at times over-hyped. In business, "big data will save us all" isn't a common refrain, but "big data will save your company" is. There are problems that having mountains of data won't solve. There are also problems that might be helped with big data, but for which there are no guarantees. So it's well worth a reality check from time to time.
Is Big Data Just Cybernetics All Over Again?
Part of Borenstein's critique of the big data movement is based on the Adam Curtis documentary All Watched Over by Machines of Loving Grace, part of which Matt "Black Belt" Jones screened at FooCamp. Jones showed the second installment which covered the cybernetics movement, which eventually gave way to systems theory and systems thinking.
Cybernetics, the study of feedback and self-regulating systems, was deeply influential in the development of the personal computer in the 60s, as documented by Fred Turner (see this interview with Turner for more). Borenstein notes that showing All Watched Over by Machines of Loving Grace at FooCamp was an act of "epic trolling" on Jones' part because "Cybernetics was the dominant philosophy of the 60s and 70s techno-counterculture within which O'Reilly arose."
I haven't seen the documentary yet, so I can't comment on its accuracy or quality (but I have been wanting to see a good critique of the quasi-religious elements of systems theory and thinking, so maybe this will be it). The film looks at the work Jay Forrester did on World3, a project that, according to Wikipedia, was meant to model and predict the "interactions between population, industrial growth, food production and limits in the ecosystems of the Earth." World3 predicted economic and societal collapse.
As a polemic, Curtis's film does more than present this history in a neutral manner. He constructs a critique of cybernetics. He argues that this emphasis on building ever-more accurate models of the world -- and, especially, automating their results through the supposedly objective computer -- represses any idea of individual agency to change the system while simultaneously causing us to project a false agency onto the system itself. In other words, Curtis focuses on cybernetics' conservative political repercussions. In his account, this faith in the technologically augmented system model becomes a reason to defend the status quo.
(Another interesting case study of this sort of applied social cybernetics is described here.)
Borenstein ties cybernetics and World3 back to the promise today of big data. From IBM's promise of a "smarter planet" to Heritage Providers Network's attempts to improve health care through a data contest, we're seeing some grand plans to improve the world through data crunching. The comparison to cybernetics is apt.
From Polygraphs to Predictive Analytics
The big question is whether things will be different this time. The tools for collecting and processing data have improved immensely since Forrester's time. And so far it seems that most implementations of big data are focusing on solving one particular problem, rather than trying to model the entire world.
Still, I suspect there will be many failures and that many of these failures will not be acknowledged. One of my predictions for 2011 was that predictive analytics (which as Revolution Analytics CEO and SPSS co-creator Norman H. Nie says, is actually just a new name for statistical modeling) would be applied to more and more areas, even if it doesn't work.
For example, earlier this year Nature reported that the U.S. Department of Homeland Security is testing a system for detecting terrorists at airports called Future Attribute Screening Technology. The journal reported "Like a lie detector, FAST measures a variety of physiological indicators, ranging from heart rate to the steadiness of a person's gaze, to judge a subject's state of mind."
The downside is that polygraphs have never been proven accurate, and I'm doubtful that this technology will be able to accurately predict individual humans' behavior. I consider myself a determinist, but I think the actual ability to predict every possible outcome of human behavior is far beyond our abilities.
Last year Jonah Lehrer wrote about a phenomena called the "decline effect" - the tendency for support for scientific claims to decrease as experiments are repeated. In particular, behavioral science seems to be greatly affected by the decline effect. Part of this is due to publication bias, but according to Lehrer a big part of the problem is sheer randomness (see also this follow-up).
Human behavior is extremely messy and hard to model. In some cases, that might be fine. If a "other products you might like" widget or targeted advertising system mostly shows users stuff they don't want, it's okay as long as it gets it right enough to boost sales. Trying to predict and prevent terrorism, without unduly targeting innocents, is much more difficult.
This Ad's For You
Companies like Hunch and OK Cupid are learning a lot of interesting stuff by analyzing data - but how useful is it? Even if we could assume beyond a shadow of a doubt that 60% of people who like beer are willing to have sex on a first date, you're always going to have the exceptions.
And even in the case of advertising, I'm still not sure we're going to get to a point that all the money being spent on finding ways to turn big data into advertising dollars is going to turn out to be a good investment. It's worked out well for Google, but it hasn't for virtually any other company. For all the data Facebook supposedly has on us, Facebook ads are less effective than banner ads.
Then there's the question of what to actually do with all that data. Last year we asked what you would do with the massive data sets generated by persistent surveillance. We mostly got crickets.
So no, not only will big data not "save us all" - it won't even save all our businesses. That doesn't mean it's not relevant - it doesn't even mean that the ability to cope with massive data sets isn't the most significant technological development of the past decade. But it would be unwise to put too much faith in our ability to crunch numbers.