We’re getting Big Data all wrong, and it’s holding us back. By making a fetish of the volume of data we’re collecting, we’ve completely overlooked the most important aspect of our data: analyzing it.
Such analysis is often assumed to be the province of data scientists, those magical unicorns that take one look at a company’s data and declare, “Buy low, sell high!”
Because data scientists can be the difference between success and failure in a company’s use of its data, finding the right kind is critical. It turns out that discovering the right data scientist is similar to analyzing one’s data: you need to make sure you’re hiring the right kind, and that you ask them the right questions.
Data Is Not The Point
As Alistair Croll writes of the Internet of Things mess, a lot of our Big Data projects thus far mainly involve finding ways to acquire and store ever increasing quantities of data, which isn’t really the point.
In fact, though the “Big” in Big Data gets the headlines, most companies don’t have petabyte-scale data problems. What they have is a problem understanding what data they have. Croll points to how to resolve this:
When the [Internet of Things] sprawl finally triggers a mass extinction, only a few companies will survive. Many of the survivors will be the ones that can discover more information by inference, and that means teams that have a data science background.
Big Data only becomes interesting, in other words, when it’s deciphered to reveal insight. As Croll suggests, this likely involves a data scientist, though finding someone that can interpret your data is non-trivial.
Two Kinds Of Data Scientist
The first step in finding a good data scientist is to determine what kind of data needs analysis, as Michael Li highlights. Analytics can be created for and consumed by machines or humans, but usually not both. “Unfortunately,” Li says, “most hiring managers conflate the types of talent and temperament necessary for these roles.”
Li breaks down the two types of data science in this way:
Data Science for Machines
- The ultimate decision maker and consumer of the analysis is a computer (e.g., ad targeting, product recommendations);
- The process involves complex digital models that ingest large amounts of data and extract insights using machine learning and algorithms, then act autonomously to display certain ads or make stock trades in real time;
- As a result, such data scientists require “exceptionally strong mathematical, statistical, and computational fluency to build models that can quickly make good predictions.”
Data science for humans
- The ultimate decision maker and consumer of the analysis is a person (e.g., understanding user growth and retention);
- This process may actually use the same data sets as the “machine” data scientist, but it must package the analysis in a human-understandable format, with an emphasis on storytelling and articulation of “how” and “why” to achieve results.
For this latter category, in particular, the data scientist should be able to “tell stories” about the data, thereby making the data understandable to mere mortals. Following Croll’s logic, I’d actually rank this attribute as perhaps the top talent for a data scientist.
Data Scientist Needles And Haystacks
Which is why I really like the questions Chris Pearson suggests we ask when interviewing data scientist candidates. Given the money to be made in Big Data, it’s not surprising that so many want to fashion themselves as data geeks. However, as Pearson notes, this has only made Big Data doubly difficult:
[There] are highly talented geniuses in our population who can change the landscape of an entire organisation, through the development of an algorithm and the implementation of some code. [But] there are lots of people out there who want you to think that they are one of those geniuses too. The question is, can you tell the difference?
For most people, most of the time, the answer is likely “No.” And if the questions we ask in interviews revolve around the technical nuances of data science, the answer will likely remain “No.”
But Pearson’s questions hone in on the business outcomes of data science, which is the exact right way to proceed.
Hence, while it’s critical to ask questions like “Can you give me an example of when you’ve developed an algorithm from a framework/research paper?” so as to tease out whether they have the blend of math/statistical modeling and programming skills required of a good data scientist, it’s equally important to ask things like, “Tell me about a time when you’ve improved a business process?”
They need, in other words, to understand how the data impacts the business. Find that sort of data scientist and you’re golden.
Photo by Idaho National Laboratory