Big Data’s Poster Child Has Issues—But They’re Not Slowing Hadoop Down


Pity poor Hadoop. The open-source software framework is virtually synonymous with the Big Data movement. Yet one of its earliest, biggest users has joined a chorus of critics, charging Hadoop with being “unpredictable” and “risky.” Others, like Gartner’s Merv Adrian, worry about its weak security provisions.

See also: Hadoop: What It Is And How It Works

Despite these (mostly) valid concerns, people and organizations are still lining up to adopt Hadoop, which makes it possible to store and process huge amounts of data on clusters of commodity hardware. Let’s assume for the sake of argument that the entire planet hasn’t just been hoodwinked into the Hadoop embrace. Why does it remain so successful?

Loopholes In Hadoop

As the poster child for the Big Data movement, it’s not surprising that Hadoop is often given a free pass when it comes to many of its weaknesses. Still, there are an awful lot of them.

As one of the earliest users of Hadoop at Yahoo!, Sean Suchter seems qualified to point out Hadoop’s weak operational capabilities. Among the concerns he highlights:

Hadoop can usually ensure that a data job completes, but it is unable to guarantee when the job will be completed. Hadoop jobs often take longer to run than anticipated, making it risky to depend on the job output in production applications. When a critical production job is running, other, lower-priority jobs can sometimes swallow up the cluster’s hardware resources, like disk and network, creating serious resource contentions that ultimately can result in critical production jobs failing to complete safely and on time.

And then there’s security. Gartner analyst Merv Adrian polled enterprises for their biggest barriers to Hadoop adoption. Among unsurprising results like “undefined value proposition,” Adrian was particularly interested by how few seemed to care about Hadoop’s security:


In response, he says, “Can it be that people believe Hadoop is secure? Because it certainly is not. At every layer of the stack, vulnerabilities exist, and at the level of the data itself there [are] numerous concerns.” 

Given the type of data—e.g., credit card transaction data, health data, etc.—commonly being used with Hadoop, it’s surprising that so few seem to be thinking about security. But it’s also surprising that these and other concerns don’t seem to be holding back Hadoop adoption.

The Hadoop Train Has Left The Station

And let’s be clear: none of these concerns has slowed Hadoop’s rise. As IDC finds, over half of enterprises have either deployed or are planning to deploy Hadoop within the next year, with over 100,000 people listing Hadoop as part of their talent profile on LinkedIn:


In part this broad adoption reflects a characteristic of Hadoop: It’s open source and encourages data exploration in a way that traditional technologies like enterprise data warehouses cannot. As Alex Popescu notes, Hadoop “allows experimenting and trying out new ideas, while continuing to accumulate and storing your data.” 

Developers and other users know it’s complex and understand its other limitations, but the upside of quickly downloading the technology and using it to store and analyze large quantities of data is too tempting.

Also, there seems to be a growing awareness that the pace of innovation in the Hadoop community is so fast that today’s challenges will likely be resolved by tomorrow. As such, Forrester analyst Mike Gualtieri declares that “[t]he Hadoop open source community and commercial vendors are innovating like gangbusters to make Hadoop an enterprise staple” to the point that it will “become must-have infrastructure for large enterprises.”

And, Not Or

One other reason that Hadoop has proved so successful is that it’s not really growing at anyone’s expense. Hadoop doesn’t displace existing data infrastructure, it just adds to it.

As Cloudera’s Christophe Bisciglia notes

Rather than replace existing systems, Hadoop augments them by offloading the particularly difficult problem of simultaneously ingesting, processing and delivering/exporting large volumes of data so existing systems can focus on what they were designed to do.

Still, while Hadoop isn’t likely to replace an enterprise data warehouse today, relative interest in Hadoop is booming relative to its EDW peers:


Hadoop isn’t perfect. It’s not manna from heaven that will feed billions or foster world peace. But it’s promising enough that enterprises are willing to overlook its problems today to benefit from its power tomorrow. 

Lead image by Arpit Gupta

Facebook Comments

New

Rising

Popular