Big Data And The Landfills Of The Digital Enterprise

You do realize that you work for the only company on the planet that isn't leveraging Big Data? That everyone else is gaining competitive advantage by aggregating cash register receipts, the weather in Miami, your sister's Facebook posts and the average shelf-life of Lindt chocolate? That you, in fact, are the world's biggest Big Data failure?

While not generally expressed in this way, in my conversations with IT and line-of-business executives, such sentiment comes out in the subtext of what they say. The media (over)hypes Big Data and IT executives then come to think they must be doing something wrong, as Gartner analyst Svetlana Sicular has found in her conversations with clients. As the current thinking goes, if enterprises don't have warehouses overflowing with data, with data scientists madly crunching the data and coming up with "actionable insights," they're doing it wrong.

One Big Landfill

As just one example, I recently heard one executive at a Fortune 100 company say, "Hadoop is our unsupervised landfill." Spoken like a man who knows his data is important, but isn't quite sure why or how. So his company just stores everything in the hopes that all that data will one day make sense.

This is a reasonable response, given the pressures, but it's actually okay to not have The Big Data Answer. Odds are, your enterprise needs to figure things out over time, even without the mythical (and expensive) data scientists we keep reading about. Sicular argues that "Organizations already have people who know their own data better than mystical data scientists." Give these in-house experts the top tools for Big Data, described in a recent Dice.com job trends report, all of which happen to be open source, and let them iterate toward understanding the data. 

Indeed, open source is the key here, not how big your data is.  

Exploration, Not Exploitation

Alex Popescu nails it when he posits, "Hadoop is so successful despite its complexity [because i]t allows experimenting and trying out new ideas, while continuing to accumulate and storing your data." Unlike with proprietary technology, in open-source Big Data technology you don't have to sign any contracts, fork over any money, or do any of the things typically expected with enterprisee software vendors. You just download and explore.

This fact was underlined for me at a Big Data panel in Chicago this week, which featured Dr. Philip Shelley, CTO at Sears Holdings. Sears is arguably one of the industry's top pioneers when it comes to Big Data, and he insisted that open-source tools like Hadoop were critical to the company iterating its way to Big Data success. Things have gone so well that he has decommissioned millions of dollars in IBM Netezza and other proprietary technology to focus on Hadoop as its data hub. As he said, "We no longer have to budget for capital expenditures" for Big Data initiatives."

That's impressive.

Yes, data volume is growing. But that's cause for exploration and iteration, not frustration and despair, following Sears' example. You're not alone if you don't yet know what to do with all your data, or if you're wondering if you have enough to bother. As Brian Proffitt has pointed out, small companies with less than gargantuan data troves can also benefit from Big Data technologies, because "big" isn't really about size at all. It's also about variety and velocity of data, among other things.

Or, as Edd Dumbhill ably notes, "'Big data' really means 'smart use of data'."

That "smart use" will almost always involve open source, as explained above. But it should also involve the understanding that you're not in a race to amass data and to recruit data scientists to decipher it. Big Data is an iterative process of using (mostly) open-source technologies to store and analyze data in different ways, learning from peers and from your own experience. It needn't be a landfill of buzzwords.

Image courtesy of Shutterstock.