Home When Should You Use Hadoop?

When Should You Use Hadoop?

RedMonk analyst Stephen O’Grady tackles the question “What Factors Justify the Use of Apache Hadoop?” O’Grady cites two of the most common criticisms of Hadoop: 1) Most users don’t actually need to analyze big data 2) MapReduce is more complex than SQL. O’Grady confirms these criticisms, but finds Hadoop useful anyway.

O’Grady acknowledges that volume isn’t the only factor in the complexity of a dataset. “Larger dataset sizes present unique computational challenges,” writes Grady. “But the structure, workload, accessibility and even location of the data may prove equally challenging.”

RedMonk uses Hadoop to analyze both structured and unstructured datasets. There are a number of other tools the firm could use to analyze the data, so why Hadoop? O’Grady responds that datasets companies use aren’t big data yet, but they are growing rapidly.

O’Grady says that RedMonk uses Big Sheets and Hive to work with Hadoop and avoid using Java to write queries.

Cloudera recently published an announcement about how the company Tynt is using Cloudera’s Hadoop distribution. Tynt is a web analytics company that processes over 20 billion viewer events per month – over 20,000 events per second. Prior to adopting Hadoop, Tynt was adding multiple MySQL databases per week to deal with the data.

Another example of a company that’s using Hadoop is Twitter. We covered Twitter’s use of Hadoop here. Twitter needs to use clusters for its data. The amount of data it stores every day is too great to be reliably written to a traditional hard drive. Twitter’s also found that SQL isn’t efficient enough to do analytics at the scale the company needs.

Like RedMonk, Twitter avoids writing Java queries. However, it uses Pig instead of Hive.

Twitter is working with 12 terrabytes of new data per day, significantly more than RedMonk uses. None the less, both companies are making good use of the technology.

How have you used Hadoop? Have you ever found that it was too big for a project that you tackled? If so, what did you end up using instead?

See also: Getting Started with Hadoop and Map Reduce

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the tech industry for major developments, new product launches, AI breakthroughs, video game releases and other newsworthy events. Editors assign relevant stories to staff writers or freelance contributors with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Get the biggest tech headlines of the day delivered to your inbox

    By signing up, you agree to our Terms and Privacy Policy. Unsubscribe anytime.

    Tech News

    Explore the latest in tech with our Tech News. We cut through the noise for concise, relevant updates, keeping you informed about the rapidly evolving tech landscape with curated content that separates signal from noise.

    In-Depth Tech Stories

    Explore tech impact in In-Depth Stories. Narrative data journalism offers comprehensive analyses, revealing stories behind data. Understand industry trends for a deeper perspective on tech's intricate relationships with society.

    Expert Reviews

    Empower decisions with Expert Reviews, merging industry expertise and insightful analysis. Delve into tech intricacies, get the best deals, and stay ahead with our trustworthy guide to navigating the ever-changing tech market.