Yesterday Yahoo announced that it will discontinue The Yahoo Distribution of Hadoop and refocus its development efforts on the main Apache Hadoop distribution. In a blog post, Yahoo VP of Software Engineering Eric Baldeschwieler writes that Yahoo will work more closely with Apache in the future. This leaves only two major distributions of Hadoop: Apache Hadoop and Cloudera’s enterprise-focused Hadoop.
“Unfortunately, Apache is no longer the obvious place to go for Hadoop releases,” writes Baldeschwieler. “The Yahoo! team wants to return to a world where anyone can download and directly use releases of Hadoop from Apache.”
This begins a long process of contributing Yahoo’s Hadoop code back into the main project. Baldeschwieler writes that Yahoo is in the process of contributing much of its work to the branch hadoop/common/branches/branch-0.20-security. Yahoo has proposed this be released as Apache Hadoop 20.100.
Yahoo has created another branch called Hadoop-future, a place for its proposed new features, including:
- HDFS-1052 – Federation, the ability to support much more storage per Hadoop cluster.
- HADOOP-6728 – A the new metrics framework
- MAPREDUCE-1220 – Optimizations for small jobs
Cloudera has remained an active contributor to Apache Hadoop. We haven’t heard back from Cloudera yet, but we don’t expect Yahoo’s announcement to change anything there.
Also of interest is Baldeschwieler’s history of Yahoo’s involvement in Hadoop.