Hadoop isn't fair. However we choose to define the Hadoop market—as a service, as just software, or including hardware and services—it's destined to be a massive market. Ironically, however, those that contribute the most to its development are not those that stand to make the most money from its success.
"Hadoop Is There For The Taking"
No, I'm not talking about end-users of the technology, which stand to bank the most value from Hadoop and other Big Data technologies. Rather, I'm talking about technology vendors like Amazon and IBM. As of last year, Amazon's Elastic MapReduce (EMR) service had already topped one million clusters, which almost certainly claims a healthy share of Amazon Web Service's estimated $2.4 billion in annual revenue.
IBM, for its part, may take home as much as 9% of the total Hadoop (and NoSQL) market, according to Wikibon Research. Indeed, IBM represents the gold standard for monetizing open-source software. Back in 2001 IBM announced it was putting $1 billion into Linux. Just two years later it was reporting more than $2 billion in Linux-related revenue.
IBM's software chief, Steve Mills, intends to put this same strategy to work for Hadoop, selling proprietary value on top of open-source Hadoop. When asked if he needed to work with Cloudera, the top contributor to Hadoop, MIlls dismissed the notion: "I don’t need Cloudera because I’ve been delivering Hadoop for 4 years. It’s there for the taking.”
A Fair Return On Anemic Investment?
Not everyone agrees with this sentiment. Cloudera chief strategy officer Mike Olson, for example, takes at IBM skimming value from Hadoop without contributing much to it:
IBM's Steve Mills says "Hadoop is there for the taking." Shame that IBM hasn't been contributing. http://t.co/4MFv4LWYjZ— Mike Olson (@mikeolson) November 13, 2013
He has a point. According to a 2012 study of Hadoop contributions, IBM comes in 17th, well behind Cloudera and Hortonworks, not to mention industry titans Pentech and WibiData (formerly known as Odiago). IBM's Hadoop contributions are almost a rounding error.
Amazon's contributions, however, are even worse. Amazon doesn't even show up in the list of the top-30 contributing companies.
A Different Kind Of Contribution?
Which is not to say that IBM and Amazon contribute nothing to Hadoop. Amazon, for its part, is quick to point out its contributions, noting that "Amazon EMR is active with the open source community and contributes many fixes back to the Hadoop source."
It's also true that companies like IBM make it easier for stodgy, conservative enterprises to embrace newfangled technology like Hadoop. I remember working for an embedded Linux company from 2000 to 2002. IBM's 2001 announcement made a huge difference in the market: suddenly, prospective buyers lost their skittishness about using Linux and deals became much easier to close. So there's real value simply in a company like IBM standing behind Hadoop.
There's also value in a company like Amazon making Hadoop easy to run for new-school developers. Hadoop can be very complex to run. Amazon makes it easier.
Over time, those who contribute most to open source get enough and to spare in return. Is it "fair" that IBM and Amazon will make far more in Hadoop-related revenue than Cloudera and Hortonworks? That's the wrong question. Open source is not a question of fairness, but rather one of equal opportunity to derive value. No one is complaining that the biggest beneficiary of Linux may well be Google, not Red Hat. That's just how it goes. Both are doing fine, just in different ways.