Proprietary Hadoop Is A Losing Strategy

Hadoop, nearly synonymous with Big Data, has many failings. But open source is not one of them. In fact, Hadoop's open-source license remains one of its biggest draws, giving enterprises plenty of reasons to persevere in using it despite its shortcomings. It's therefore hard to see how EMC's new Pivotal HD, essentially a proprietary distribution of Hadoop, can hope to succeed.

Not that everyone agrees with this statement.

Dan Woods, CTO and editor of CITO Research and a contributor to Forbes, argues that embedding Hadoop into EMC Greenplum's massively parallel processing (MPP) database (HAWQ) offers CIOs and CTOs the simplicity they need to be successful with Hadoop. He has a point: Hadoop is complex and somewhat hard to use, which is why Cloudera CEO Mike Olson has argued that most of the world will experience the power of Hadoop through applications, nearly all of which will be proprietary, I might add.

But Olson's argument differs from Woods' argument in at at least one major way: Pivotal HD is enterprise infrastructure, not an application, and enterprise infrastructure is increasingly open source.

There are plenty of reasons for this, but RethinkDB's Alex Popescu nails one critical factor:

Hadoop is so successful despite its complexity [because i]t allows experimenting and trying out new ideas, while continuing to accumulate and storing your data. It removes the pressure from the developers. That’s agility. It’s highly appreciated.

In other words, a big reason for Hadoop's success is its open-source license, which permits a hefty amount of experimentation without having to get an enterprise license from EMC, Oracle, or any of the other incumbent infrastructure vendors.  

EMC's Scott Yara tries to deflect criticism of its proprietary foray into Hadoop by declaring "We're all in on Hadoop, period," but as 451 Research analyst Matt Aslett counters, "I have no doubt that EMC Greenplum is 'all in' on Pivotal HD, but that’s not the same thing at all."

Take this away by building a proprietary Hadoop distribution, and EMC has basically erased the very thing that made Hadoop workloads proliferate in the first place. EMC also cuts itself out of the standard adoption cycle for Hadoop, as Redmonk analyst Stephen O'Grady suggests, "Certainly there will be customers whose needs will dictate the adoption of a unique solution like Pivotal HD, but how many will that be relative to the segment whose adoption cycle begins with the download of one of the free Hadoop distributions?"

Today, Hadoop is one of the industry's hottest job trends. Even in absolute job numbers, it's about to pass EMC-related job posts:

Enterprises aren't hiring for EMC's brand of Hadoop. They're hiring for the open source Hadoop. This matters.

Perhaps EMC feels that Hadoop's brand is big enough now that enterprises essentially understand it and are ready to move on from experimentation to full-scale adoption. In this EMC is likely to be disappointed. According to recent IBM survey data, only 6% of enterprises have two or more Big Data projects underway (likely, though not explicitly, involving Hadoop in some way), and a mere 22% are running pilots to test the efficacy of their Big Data strategies. Everyone else is in full-on planning mode.

By creating a proprietary Hadoop distribution, EMC just dramatically limited its access to the 94% that are still in Big Data education and trial mode. Yes, it has a gargantuan sales force. No, they're simply not going to be able to reach would-be customers as efficiently as an open-source distribution model does.

But maybe EMC hasn't gone proprietary to more effectively monetize Hadoop interest, and instead sincerely believes, like Woods ("open source development has its limits"), that complex infrastructure problems are a poor match for open source. History has not been kind to such thinking, as Aslett sarcastically implies:

EMC has seemingly bottomless resources to throw at Hadoop, and every incentive to do so. It's a smart, highly successful company and no doubt will prove successful with Pivotal HD. However, I can't see it ever dominating an open-source infrastructure market with a proprietary distribution. Open source is the foundation for today's most interesting markets, from Big Data to mobile to cloud computing. It's unlikely that EMC will somehow stem this tide with a proprietary product, no matter its short-term performance or functionality advantages.