Home Latest MapR 1.2 Distribution Prepares for a New Hadoop

Latest MapR 1.2 Distribution Prepares for a New Hadoop

The astonishing speed at which the “big data” processing industry is evolving dwarfs anything we’ve ever seen with regard to software. Problems that stymied the best engineers just 18 short months ago are now commonplace tasks for modern data centers. Already, the systems envisioned by Google’s and Yahoo’s engineers are being prepared for the history books, as 2012 should bring forth the second generation of open source, scalable, big data processing.

Which is why MapR Technologies’ release this week of the next distribution of MapR – which includes a commercial implementation of Hadoop called M5 – is important, not for what it will do today, but for what it will enable tomorrow: Hadoop is gearing up for a kind of liver transplant, if you will, in the upcoming version 0.23.

The division of resources and nodes into two tiers in Hadoop 0.23. [Chart by Apache Software Foundation.]

The Hadoop system splits huge database tasks across multiple nodes in a data cluster. In this system, it’s MapReduce’s job to process data in parallel batches, and return results that are effectively mathematical “reductions” of those batches. In the old Hadoop architecture (where “old” in this instance means weeks and weeks ago), the process that monitors and marshals the progress on these parallel batch jobs was called JobTracker. Since its implementation, engineers have realized that it would be better to think of jobs not as “lumps” of processes, but instead more like an operating system: with a ResourceManager that manages addressable resources in a cluster, and an ApplicationManager that marshals the use of those resources.

It’s a fundamental rethinking of Hadoop architecture; and because the tools that implement Hadoop today are still (relative to a real-world clock) new, rolling out this change is liable to upset the balance of things in some data centers. Imagine inserting pecans into the middle of a mold of Jell-O that’s already half-set, and you’ll get the idea.

One of MapR’s key innovations was originally around a high availability (HA) implementation of JobTracker. It takes into account the fact that processes fail, so that when a JobTracker does crash, MapR can spin a new one and attach the existing TaskTrackers to it before they’ve noticed anything’s happened at all. Since it’s JobTracker that’s getting the axe, the new MapR 1.2 will prepare data centers for the time when that HA capability will need to spin a new version of ApplicationTracker instead.

Also, MapR has historically replaced Hadoop’s crude, write-once distributed file system (HDFS) with a direct-access, lockless storage system based on Sun’s pre-established Network File System (NFS) protocol. Another key feature of a future Hadoop distribution is HDFS federation, effectively sharing the file system with multiple, distributed nodes. This won’t actually help MapR much, if at all, although it may conceivably introduce a problem with compatibility that MapR 1.2 may also address.

As MapR Chief Application Architect Ted Dunning explained to a customer who questioned the benefits of HDFS federation, “The problem of reliability just get worse with federation because the previous single point of failure is multiplied. Looking only at hardware reliability, if you have ten name nodes, the mean time between hardware failures for your cluster is likely to decrease to a few months down from the current level of a few years. Of course, human fallibility normally increases these failure rates significantly. MapR addresses both hardware and human failure modes and also provides much higher scalability than Apache Hadoop.”

MapR’s Hadoop distributions are classed as M3, which is both free and 100% compatible with Apache Hadoop; and M5, which contains all of MapR Technologies’ commercial innovations.

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the tech industry for major developments, new product launches, AI breakthroughs, video game releases and other newsworthy events. Editors assign relevant stories to staff writers or freelance contributors with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Get the biggest tech headlines of the day delivered to your inbox

    By signing up, you agree to our Terms and Privacy Policy. Unsubscribe anytime.

    Tech News

    Explore the latest in tech with our Tech News. We cut through the noise for concise, relevant updates, keeping you informed about the rapidly evolving tech landscape with curated content that separates signal from noise.

    In-Depth Tech Stories

    Explore tech impact in In-Depth Stories. Narrative data journalism offers comprehensive analyses, revealing stories behind data. Understand industry trends for a deeper perspective on tech's intricate relationships with society.

    Expert Reviews

    Empower decisions with Expert Reviews, merging industry expertise and insightful analysis. Delve into tech intricacies, get the best deals, and stay ahead with our trustworthy guide to navigating the ever-changing tech market.