Home Big Data’s Poster Child Has Issues—But They’re Not Slowing Hadoop Down

Big Data’s Poster Child Has Issues—But They’re Not Slowing Hadoop Down

Pity poor Hadoop. The open-source software framework is virtually synonymous with the Big Data movement. Yet one of its earliest, biggest users has joined a chorus of critics, charging Hadoop with being “unpredictable” and “risky.” Others, like Gartner’s Merv Adrian, worry about its weak security provisions.

See also: Hadoop: What It Is And How It Works

Despite these (mostly) valid concerns, people and organizations are still lining up to adopt Hadoop, which makes it possible to store and process huge amounts of data on clusters of commodity hardware. Let’s assume for the sake of argument that the entire planet hasn’t just been hoodwinked into the Hadoop embrace. Why does it remain so successful?

Loopholes In Hadoop

As the poster child for the Big Data movement, it’s not surprising that Hadoop is often given a free pass when it comes to many of its weaknesses. Still, there are an awful lot of them.

As one of the earliest users of Hadoop at Yahoo!, Sean Suchter seems qualified to point out Hadoop’s weak operational capabilities. Among the concerns he highlights:

Hadoop can usually ensure that a data job completes, but it is unable to guarantee when the job will be completed. Hadoop jobs often take longer to run than anticipated, making it risky to depend on the job output in production applications. When a critical production job is running, other, lower-priority jobs can sometimes swallow up the cluster’s hardware resources, like disk and network, creating serious resource contentions that ultimately can result in critical production jobs failing to complete safely and on time.

And then there’s security. Gartner analyst Merv Adrian polled enterprises for their biggest barriers to Hadoop adoption. Among unsurprising results like “undefined value proposition,” Adrian was particularly interested by how few seemed to care about Hadoop’s security:

In response, he says, “Can it be that people believe Hadoop is secure? Because it certainly is not. At every layer of the stack, vulnerabilities exist, and at the level of the data itself there [are] numerous concerns.” 

Given the type of data—e.g., credit card transaction data, health data, etc.—commonly being used with Hadoop, it’s surprising that so few seem to be thinking about security. But it’s also surprising that these and other concerns don’t seem to be holding back Hadoop adoption.

The Hadoop Train Has Left The Station

And let’s be clear: none of these concerns has slowed Hadoop’s rise. As IDC finds, over half of enterprises have either deployed or are planning to deploy Hadoop within the next year, with over 100,000 people listing Hadoop as part of their talent profile on LinkedIn:

In part this broad adoption reflects a characteristic of Hadoop: It’s open source and encourages data exploration in a way that traditional technologies like enterprise data warehouses cannot. As Alex Popescu notes, Hadoop “allows experimenting and trying out new ideas, while continuing to accumulate and storing your data.” 

Developers and other users know it’s complex and understand its other limitations, but the upside of quickly downloading the technology and using it to store and analyze large quantities of data is too tempting.

Also, there seems to be a growing awareness that the pace of innovation in the Hadoop community is so fast that today’s challenges will likely be resolved by tomorrow. As such, Forrester analyst Mike Gualtieri declares that “[t]he Hadoop open source community and commercial vendors are innovating like gangbusters to make Hadoop an enterprise staple” to the point that it will “become must-have infrastructure for large enterprises.”

And, Not Or

One other reason that Hadoop has proved so successful is that it’s not really growing at anyone’s expense. Hadoop doesn’t displace existing data infrastructure, it just adds to it.

As Cloudera’s Christophe Bisciglia notes

Rather than replace existing systems, Hadoop augments them by offloading the particularly difficult problem of simultaneously ingesting, processing and delivering/exporting large volumes of data so existing systems can focus on what they were designed to do.

Still, while Hadoop isn’t likely to replace an enterprise data warehouse today, relative interest in Hadoop is booming relative to its EDW peers:

Hadoop isn’t perfect. It’s not manna from heaven that will feed billions or foster world peace. But it’s promising enough that enterprises are willing to overlook its problems today to benefit from its power tomorrow. 

Lead image by Arpit Gupta

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the tech industry for major developments, new product launches, AI breakthroughs, video game releases and other newsworthy events. Editors assign relevant stories to staff writers or freelance contributors with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Get the biggest tech headlines of the day delivered to your inbox

    By signing up, you agree to our Terms and Privacy Policy. Unsubscribe anytime.

    Tech News

    Explore the latest in tech with our Tech News. We cut through the noise for concise, relevant updates, keeping you informed about the rapidly evolving tech landscape with curated content that separates signal from noise.

    In-Depth Tech Stories

    Explore tech impact in In-Depth Stories. Narrative data journalism offers comprehensive analyses, revealing stories behind data. Understand industry trends for a deeper perspective on tech's intricate relationships with society.

    Expert Reviews

    Empower decisions with Expert Reviews, merging industry expertise and insightful analysis. Delve into tech intricacies, get the best deals, and stay ahead with our trustworthy guide to navigating the ever-changing tech market.