The dirty little secret of Hadoop has been just how dull many of its tasks have been. By far the biggest use for Hadoop to date has been as a “poor person’s ETL“—that is, a form of data integration, at the risk of oversimplifying—rather than all the big, sexy data science we see constantly hyped.
But that’s changing. As a new Sand Hill Group survey reveals, a significant percentage of enterprises are moving beyond Hadoop’s mundane past to leverage it for advanced analytics.
This shouldn’t be too surprising. Hadoop is still new to most companies, with 47% of respondents in Sand Hill Group’s survey citing a lack of Hadoop skills, coupled with a shortage of talent to hire (21%) as top challenges inhibiting their Hadoop ambitions. It’s impossible to move from beginner to expert in the few seconds it takes to download Hadoop.
Compounding this problem, Hadoop has not traditionally been the most approachable system to use. Enterprises have been willing to muddle through its complexities, however, because it so dramatically lowers the cost profile of a Big Data project, given that it’s powerful open-source software running on commodity servers. While Hadoop is becoming easier to use, it still imposes a steep learning curve that requires time and experience to master.
So we’re seeing enterprises that started with Hadoop as their “unsupervised landfill” now moving to more complex, and important, workloads, as 451 Research analyst Matt Aslett points out.
Until companies feel confident in a given technology, they’re not going to start using it for mission critical applications. At one time it was anathema to use Linux in the data center, so it was used at the edge of the network for file and print servers and more pedestrian workloads. Now it would be anathema to not use Linux in mission-critical data center applications.
Taking The Hadoop Training Wheels Off
We’re seeing the same thing with Hadoop. The majority of enterprises are still kicking Hadoop’s tires, running relatively small clusters of just five to nine nodes. But as this tire-kicking phase ends, the real work will be getting started.
While most every other workload will either decline or stay roughly constant going forward, Hadoop is expected to boom in advanced analytics workloads, as a Sand Hill Group survey showcases:
Even so, Hadoop remains somewhat mired in log data (61% of respondents are using Hadoop to store log data), followed by operational data used in CRM and ERP systems (53%). Very few companies are using it today for streaming, real-time data. Presumably this will change as enterprises look to put real-time data to use in advanced analytics projects. (More likely, they’ll do what Criteo and others are doing by marrying Hadoop’s advanced analytics capabilities with a NoSQL database for real-time data capture and response.)
Enterprises Are Betting Big On Hadoop
Regardless, what we’re seeing is a significant maturing of enterprise adoption of Hadoop. As the report notes:
This almost-threefold increase over the 8.9% currently developing advanced analytics is emblematic of the larger shift towards initiatives that can have a transformational impact on the organization. It also conveys both the aggressive expectations for skill and experience development and the urgent need to mine the available data to improve business decisions and results.
There will be hiccups along the way, of course, but as Hadoop enthusiast Floyd Strimling posits, “Hadoop bent the storage and compute cost curves to allow everyone to analyze data. There is no going back.”
Image courtesy of Shutterstock