The Big Data boom has largely been fueled by a simple calculation: Data + Technology = Actionable Insights, Magic Ponies, and Superpowers. The reality, of course, is far more pedestrian, because while Big Data technology has indeed increased our ability to store and process lots of disparate data in real-time, the technology is only as useful the people managing it. As Bill Wise, CEO of Mediaocean, highlights, the costs of getting it wrong increase as our reliance on data grows.
To be clear, we've long been able to query so-called "Big Data." We've had expensive data warehousing and Business Intelligence tools for many years. The great innovation of tools like Hadoop is that they've made such capabilities available as free, open-source tools that run on commodity hardware, essentially paving the way for anyone and everyone to become a data scientist.
Therein lies the problem.
Taking an influential paper on economics and intelligence efforts around the Boston bombing suspects as background, wherein a few missing rows in Excel and a misspelling of Boston Marathon bombing suspect Tamerlan Tsarnaev's name, Wise points out that "data management tools (i.e., the FBI’s systems and Excel) were undone by fairly simple errors," with terrible results. In other words, as much as we may believe Big Data is as simple as "Input data into Hadoop, out come insights!", the reality depends heavily on the people querying that data.
And the bigger the data, the bigger the likelihood we'll read it wrong, as Wise posits:
[M]ore human/data interaction means a lot more room for error (and inefficiency) around increasingly critical data sets - which... can have very serious results... If Big Data can’t fit hand-in-glove with usability and workflow, a lot of the promise of big data will be empty data crunching. That’s not just a problem for getting where we want to be in the evolution of computing. It’s a situation that can lead to bad data management - which translates into bad economics and, sometimes, far worse.
This confirms renowned statistician Nate Silver's arguments that data doesn't speak for itself, but is instead corrupted by our biases. Worse, the bigger the data set, the more noise to sift through: "the noise is increasing faster than the signal. There are so many hypotheses to test, so many data sets to mine - but a relatively constant amount of objective truth."
Often, misunderstanding our data simply means our businesses will run more inefficiently or, at least, no more efficiently than before. But if Wise is correct, getting our data wrong can have disastrous consequences.
Which means, as I've argued before, that we really need to look inside our organizations for "data scientists," because context is critical to effectively querying our data, as well as knowing which data to collect in the first place. It also means, as Kate Crawford argues in Harvard Business Review, "data scientists should take a page from social scientists, who have a long history of asking where the data they're working with comes from, what methods were used to gather and analyze it, and what cognitive biases they might bring to its interpretation."
In other words, the more data has the potential to impact our organizations, the more humble and circumspect we should become in using it. The consequences of reading our data wrong scale with the volume and velocity of that data.
Image courtesy of Shutterstock.