Even as the hype around Hadoop has exploded, its use as a “poor man’s ETL” or “unsupervised landfill” has remained somewhat pedestrian. But that’s about to change. At least, if Cloudera has its way.
Cloudera is moving beyond Hadoop to add services like search, yet keeping these add-on services firmly grounded in HDFS.
(See also: Searching Hadoop Data Just Got A Lot Easier)
It’s an ambitious strategy, one that views Hadoop as the operating system for Big Data, and HDFS as the common repository shared by a number of different tools/services, e.g., MapReduce, Pig and Hive as the batch computing engine, Impala as the SQL engine, Lucene/Solr for full-text indexing and search, etc.
It also implicitly assumes that Big Data is the data market worth chasing, an assumption I’ve challenged before (see here and here, for starters). Regardless, there’s certainly going to be an important evolution of the data warehouse, and Hadoop will play a central role.
The question is whether Cloudera can do it alone.
As 451 Research analyst Matthew Aslett highlights, while Cloudera has hitherto played nicely with data platform vendors like Oracle, recently
Cloudera… announced its intention to challenge the incumbent data management providers by positioning Hadoop as the focal point of next-generation data management platforms and calling on enterprises to ‘unaccept the status quo.’
Questionable grammar aside, this positioning is the inevitable consequence of Cloudera expanding its purview beyond simply being seen as a distributor of Hadoop for batch-based data processing. With Cloudera Enterprise, it has assembled what could now best be described as a multi-purpose data-processing and analytics platform.
Aslett goes on to note that Hortonworks, another Hadoop vendor, has a much more partner-friendly approach to the Hadoop market, pointing out that “Hortonworks’ vision is not based on building a stack of data management and processing capabilities around Hadoop but on improving the flexibility of Hadoop itself for handling multiple application workloads via Apache YARN.” In other words, while Hortonworks also sees Hadoop as the “OS” for Big Data, it’s not building an all-encompassing Big Data suite.
(See also: Hadoop: What It Is And How It Works and The Real Reason Hadoop Is Such A Big Deal In Big Data)
This ambition would be at risk were Cloudera unprofitably limping along with minimal revenue, but it’s not. Sales are on track to exceed $100 million this year, according to several sources close to the company. As an independent company, Cloudera evidently feels it can risk upsetting the incumbent data management vendors.
Does it really have a choice? At a certain point, its erstwhile partners will grow wary of its growing size, even as Hadoop takes on capabilities that encroach on their turf, whether Business Intelligence, Database or other adjacent markets. Cloudera perhaps could have waited longer to declare its intentions, but that’s a matter of near-term tactics, not long-term strategy. Any Hadoop vendor will necessarily be competitive to yesterday’s data warehousing and analytics tools.
Cloudera is simply being honest about that fact.
But it’s still an open question whether a more partner-friendly approach like Hortonworks is going to win out over time. In a parallel market, Red Hat came to dominate the Linux market by having the most expansive partner program that made a host of database, application and other vendors its allies against proprietary UNIX. Only in the past few years has Red Hat expanded beyond its operating system beachhead to take on middleware and other markets.
Cloudera is striking out on its own much sooner. It remains to be seen whether this is genius or brash.
Lead image courtesy of Shutterstock