Home The Big Data Tool Hadoop Is Growing More Powerful, But Also Harder To Use

The Big Data Tool Hadoop Is Growing More Powerful, But Also Harder To Use

These days, it’s getting hard to pinpoint what, exactly, Hadoop is. Or, for that matter, what it isn’t. 

See also: Hadoop—What It Is And How It Works

A few years back, Hadoop was essentially MapReduce, a batch-oriented system for processing large amounts of data, leading people to mistakenly conflate Hadoop with Big Data and Big Data with “lots and lots of data.” But if the market was confused then, it’s far worse today. Because Hadoop is taking on all sorts of capabilities that just two years ago were considered impossible. 

While this is obviously good for the Hadoop platform, it may actually make it harder for the Hadoop user.

Hadoop: And Miles To Go Before It Sleeps

Hadoop has been around since 2007, yet adoption today remains small. This despite widespread interest in putting it to use, as Gartner analyst Svetlana Sicular highlights:

And yet despite this strong interest, actual adoption remains limited, as 451 Research finds:

Source: 451 Research

Some of the dissonance between interest in using and actual usage comes down to Hadoop’s complexity, as Pepperdata’s CEO details. Some of it stems from vendors overselling current capabilities, confusing enterprises as to how to derive value from Hadoop and other big data technologies today.

But some of it derives from one of Hadoop’s greatest strengths: its flexibility. 

The Hadoop “Thneed”

In Dr. Suess’ The Lorax, an industry is built up to manufacture thneeds, “a-fine-something-that-all-people need.” Thneeds can be pretty much anything (“It’s a shirt. It’s a sock. It’s a glove. It’s a hat. But it has OTHER uses. Yes, far beyond that. You can use it for carpets. For pillows! For sheets! Or curtains! Or covers for bicycle seats!”).

Hadoop is much the same. 

Back in 2012, Hadoop’s creator, Doug Cutting, told me that “Hadoop is the [operating system] for big data,” surrounded by a “suite of tools on the Hadoop platform [that] keeps growing.”

Today he is even more bullish on this original vision, declaring that Hadoop “will be good at most things but not the best at everything.”

In part, this is due to the birth of YARN. Nothing has had as big an impact on Hadoop, however, as YARN. YARN is the second generation of MapReduce, a resource/cluster management tool that helps to extend Hadoop’s utility in profound ways. 

Gartner analyst Merv Adrian describes it this way, suggesting that in the early days of Hadoop …

… the story was simpler. Hadoop was HDFS, MapReduce and some utilities. As those utilities got formalized and became projects themselves and were supported by commercial distributors, the list grew: Pig, Hive, HBase, and Zookeeper were Hadoop too. And a few months ago, as I noticed, Accumulo, Avro, Cascading, Flume, Mahout, Oozie, Spark, Sqoop, and YARN had joined the list.

YARN is the one that really matters here because it doesn’t just mean the list of components will change, but because in its wake the list of components will change Hadoop’s meaning. YARN enables Hadoop to be more than a brute force, batch blunt instrument for analytics and ETL jobs. It can be an interactive analytic tool, an event processor, a transactional system, a governed, secure system for complex, mixed workloads.

YARN, then, is a force multiplier for the Hadoop community.

The Blessing And Curse Of Community

This is both good and bad. As MapR CEO John Schroeder states, “No one will ever have more than 15 or 20 percent of the [Hadoop] committers, so you can’t dominate the community.” Can’t dominate, and also can’t direct. 

With so many cooks in the Hadoop kitchen, it’s no wonder that it has taken on so many different forms and functions. As Adrian notes, “like everyone else, I’m redefining Hadoop to suit my own purposes.”

This is one key reason that Hadoop remains complex despite Hortonworks CEO Rob Bearden’s contention that the goal of the communal Hadoop development is to make it “mind-numbingly simple and reliable.”

It’s anything but. And it turns out that it’s hard to use something that is all things to all people. 

Hope Is On The Way

As complicated as Hadoop can be—definitionally and in terms of implementation—it may actually become easier to understand. While the underlying technology remains complex, a host of companies have jumped into the fray to hide that complexity from users, as Adrian posits:

https://twitter.com/merv/status/521672454943420416

Cloudera co-founder Mike Olson articulated this well back in 2012, insisting that most enterprises would tap into Hadoop’s value through cloud application providers. In other words, it’s unlikely that the complexity of Hadoop’s community and resulting technological complexity is going to become any better anytime soon.

But this also won’t matter, as the gearheads at companies as varied as Facebook and Zoomdata will leverage Hadoop to deliver services to consumers and enterprises that are easy to use. That’s the promise of Hadoop: not only does its community include vendors like Cloudera and Hortonworks that continuously improve and expand its technology, but it also includes those that can abstract away the complexity and make it usable by mainstream enterprises.

Lead image of a Cubieboard Hadoop cluster via Wikimedia Commons

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the tech industry for major developments, new product launches, AI breakthroughs, video game releases and other newsworthy events. Editors assign relevant stories to staff writers or freelance contributors with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Get the biggest tech headlines of the day delivered to your inbox

    By signing up, you agree to our Terms and Privacy Policy. Unsubscribe anytime.

    Tech News

    Explore the latest in tech with our Tech News. We cut through the noise for concise, relevant updates, keeping you informed about the rapidly evolving tech landscape with curated content that separates signal from noise.

    In-Depth Tech Stories

    Explore tech impact in In-Depth Stories. Narrative data journalism offers comprehensive analyses, revealing stories behind data. Understand industry trends for a deeper perspective on tech's intricate relationships with society.

    Expert Reviews

    Empower decisions with Expert Reviews, merging industry expertise and insightful analysis. Delve into tech intricacies, get the best deals, and stay ahead with our trustworthy guide to navigating the ever-changing tech market.