Home Hortonworks Responds: Counting Hadoop Code and Giving Credit Where Due

Hortonworks Responds: Counting Hadoop Code and Giving Credit Where Due

Things are getting lively in the Hadoop community, especially between Hortonworks and Cloudera. The issue? Which companies are contributing the most to Hadoop, and how contributions ought to be tallied up.

It started with Owen O’Malley of Hortonworks, who did some calculations of contributions to Hadoop based on lines of code. The problem is that O’Malley credited work just by looking at the initial employer of contributors, rather than employers at the time of contribution.

Patches or Lines of Code?

Mike Olson of Cloudera took another whack at the numbers, which I looked at last week. Olson broke out the numbers by looking at the patches contributed to Hadoop and its ecosystem (projects like HBase, ZooKeeper, Pig, Mahout and Oozie).

O’Malley has come back with a counter-post that tallies contributions by lines of code but sticking to Cloudera’s method of counting current employer. The result shows Hortonworks far ahead of Cloudera, Facebook, IBM and even Yahoo. For 2011, according to O’Malley, Hortonworks has contributed more than 42% of the lines of code to Hadoop, Yahoo nearly 26%, and Cloudera a bit more than 15%. Lines of code are a better measure, says O’Malley, because “patches differ in their investment of time and effort.” (Of course, the same thing can be said about a line of code, too.)

Finally, O’Malley does provide a comparison that looks at patches and lines of code since 2006 and another comparison for 2011 alone. This puts Cloudera in a much better light, with nearly 30% of patches in 2011 so far, compared to 25% for Hortonworks and about 23% for Yahoo.

Lively Competition

If you’re going to be comparing contributions, I think that the best way is to sum up patches and lines of code. There’s really no concrete way to objectively say “company Y absolutely contributed the most” to a project just by counting code. A company’s code contribution might be a small code drop that adds a killer feature. A company’s contribution may be a series of patches that effectively removes thousands of lines of code, but improves the project with better code.

I think it’s safe to say that Cloudera and Hortonworks are both providing a good showing when it comes to Hadoop contributions, regardless of which company is actually contributing the most. And the results show that Hadoop is getting contributions from a healthy group of companies.

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the tech industry for major developments, new product launches, AI breakthroughs, video game releases and other newsworthy events. Editors assign relevant stories to staff writers or freelance contributors with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Get the biggest tech headlines of the day delivered to your inbox

    By signing up, you agree to our Terms and Privacy Policy. Unsubscribe anytime.

    Tech News

    Explore the latest in tech with our Tech News. We cut through the noise for concise, relevant updates, keeping you informed about the rapidly evolving tech landscape with curated content that separates signal from noise.

    In-Depth Tech Stories

    Explore tech impact in In-Depth Stories. Narrative data journalism offers comprehensive analyses, revealing stories behind data. Understand industry trends for a deeper perspective on tech's intricate relationships with society.

    Expert Reviews

    Empower decisions with Expert Reviews, merging industry expertise and insightful analysis. Delve into tech intricacies, get the best deals, and stay ahead with our trustworthy guide to navigating the ever-changing tech market.