Amazon And IBM vs. Open Source Hadoop: Bigness May Not Beat Quality

At the heart of the Big Data movement is Hadoop, an open-source framework used for storing and processing large quantities of data. For years open source startups Cloudera and Hortonworks have had the Hadoop market largely to themselves. Sure, proprietary software vendors like Oracle, Microsoft and others have circled the market, but their participation has largely come through partnerships with these dedicated Hadoop startups.

Now, according to a new report from Forrester, the legacy tech vendors may demonstrate the most compelling strategies, even if their current Hadoop offerings still leave much to be desired. What’s unclear is how these proprietary vendors can hope to deliver a robust product based on an open-source project to which they contribute little and therefore influence even less.

Hadoop: No Longer Optional

Getting Hadoop right is a big deal. As Forrester notes, much is at stake in the shifting data infrastructure landscape, and Hadoop “form[s] the cornerstone of any flexible future data management platform.” In other words, if you’re a tech vendor hoping to remain relevant in the modern enterprise, you need a Hadoop story.

Hadoop is particularly interesting because it enables enterprises to store and analyze massive quantities of data at comparatively little cost. As Forrester finds, enterprises currently analyze a mere 12% of their data. To some degree, this is because it’s often unclear what to do with one’s data. 

Hadoop eases the transition into Big Data because it encourages enterprises to store their data at lower costs, only to process it later once the company figures out how to best analyze the data.

While it’s true, as I’ve written before, that this can result in some enterprises viewing Hadoop as their “unsupervised digital landfill,” enterprises are becoming increasingly savvy about Hadoop and Big Data—generally, marrying Hadoop’s analytical capabilities with a real-time data processing engine like a NoSQL database to glean intelligence from a company’s data and act on it in real time. 

Given this maturing view of Hadoop, who are the vendors to watch?

The Elephants In The Room

Oddly, Forrester suggests we look beyond vendors that actually invest heavily in Hadoop’s development.

In open source, being the source of code is even more important than owning the source code. So in the same way a proprietary software vendor can charge for software licenses because it keeps its intellectual property under wraps, the market power of an open source vendor directly correlates to that vendor’s influence over an open-source project—that is, to how much it gives away.

This is why Forrester’s analysis strikes me as slightly askew. Rather than focusing exclusively on the current state of vendors’ Hadoop offerings—where the Hadoop startups shine precisely because they contribute most to Hadoop’s development—Forrester suggests the real winners going forward are the big tech companies like IBM, Amazon Web Services (AWS) and Pivotal.

Credit: Forrester (Used with permission)

Surprisingly, while “strategy” includes licensing and pricing, ability to execute, product road map and customer support, Forrester doesn’t comment at all on the companies’ community outreach. As ReadWrite has reported before, by contributing little to Hadoop’s development, vendors like IBM and AWS are poorly positioned to shape its direction, as Hadoop founder (and Cloudera employee) Doug Cutting posits:

Similarly, Hortonworks CEO Rob Bearden suggested that community is crucial to ensuring Hadoop’s ongoing relevance:

Hadoop is becoming the cornerstone of the modern data architecture, and we remain committed to working with and contributing back to the community to ensure that the new Hadoop core reaches its full potential as the next-generation data platform.

Can others simply ride the bandwagon without paying for its maintenance?

Does It Matter?

Sometimes life—and open source—isn’t fair. Amazon has made far more money on MySQL, for example, than MySQL or Oracle (which acquired it through its Sun acquisition) have. Similarly, IBM has made far more money with Linux than Red Hat, the Linux leader, has.

But Forrester’s view may not be completely accurate, either. For example, in the area of “Customer Support,” Hortonworks got a 5 out of 5, which makes sense. Hortonworks contributes a lot of code to Hadoop. But Pivotal and IBM also scored 5’s, which doesn’t make much sense.

Perhaps Forrester felt those companies could provide great customer support for their proprietary products built around Hadoop? Fair enough. But it’s impossible for any vendor that isn’t deeply invested in the development of an open-source project to match the support capability of a vendor that has made such an investment. 

Amazon and others will almost certainly build great services around Hadoop, but ultimately they’re in a poor position to support customers on Hadoop because they don’t contribute to its development. As such, they’re always forced to be followers, not leaders, on the project.

It’s no wonder then that Shaun Connolly, Hortonworks vice president of corporate strategy, wrote to me to say “While all four areas of Strategy are important, we are especially proud that we received 5 out of 5 for the Product Road Map and Customer Support areas of the Strategy section.”

He should be proud. This is how companies like Hortonworks (and Cloudera) guarantee excellent service for their customers.

More Than The Elephant

Ultimately, it seems that Forrester’s ranking of Hadoop vendors has less to do with Hadoop and more to do with how Hadoop fits into their larger product strategies. For instance, of IBM, Forrester writes, “IBM’s road map includes continuing to integrate the BigInsights Hadoop solution with related IBM assets like SPSS advanced analytics, workload management for high-performance computing, BI tools, and data management and modeling tools.”

Hadoop, in other words, is a nice complement for these vendors—not the main event.

This may be fine. No doubt IBM, AWS and others will make a lot of money enriching their products with Hadoop. But for those companies that want to get value from Hadoop itself, they may be better served looking to those that contribute most to its development: Cloudera and Hortonworks. Only these vendors are in a credible position to influence Hadoop’s roadmap, and support it best.

Facebook Comments