Why Hadoop Isn’t Killing The Data Warehouse

Hadoop is big, and it’s only getting bigger. But at whose expense is it growing?

See also: Hadoop: What It Is And How It Works

There are two realities here: One reality is that Hadoop, the open source software platform managed by the Apache Software Foundation, doesn’t threaten yesterday’s data warehouse workloads, and it won’t anytime soon—or ever—even though a zero-sum data infrastructure market makes for good reading. The other reality is that for reasons of cost and flexibility, more and more enterprises are moving to Hadoop.

The question is: Will vendors like Oracle, Teradata and IBM be able to embrace and extend Hadoop in time?

Hadoop: This Is Not The Thneed You’re Looking For

Enterprise CIOs seem to be growing smarter about Hadoop and its value. While double-digit percentages of CIOs used to dream of replacing their enterprise data warehouses with Hadoop, that number steadily falls every year:

These companies haven’t lost faith in Hadoop. On the contrary, a separate Gartner survey sees robust Hadoop penetration in the enterprise:  

The question is not whether enterprises will use Hadoop, but rather how, and for what particular use cases.

Hadoop marketing used to be roughly analogous to Dr. Seuss’ description of the Thneed in The Lorax (“A Fine-Something-That-All-People-Need! It’s a shirt. It’s a sock. It’s a glove. It’s a hat. But it has OTHER uses. Yes, far beyond that”). Lately, however, Hadoop’s proponents have gotten real, focusing on real-world use cases.

All Your Future Data Are Belong To Us

If Hadoop isn’t replacing the traditional data warehouse, what is it replacing? The answer is, “not much.”

Hadoop (and its kissing cousin, the NoSQL database) isn’t replacing legacy technology so much as it’s usurping its place in modern workloads. This means enterprises will end up supporting both legacy technology and Hadoop/NoSQL to manage both existing and new workloads, as HSBC’s Global Head of HSS IT Architecture, Alisdair Anderson, declared at Hadoop Summit recently:

There’s no relationship between the EDW and Hadoop right now — they are going to be complementary. It’s NOT about rip and replace: we’re not going to get rid of RDBMS or MPP, but instead use the right tool for right job — and that will very much be driven by price.

Of course, given “the effective price of core Hadoop distribution software and support services is nearly zero” at this point, as Jeff Kelly highlights, more and more workloads will gravitate to Hadoop. So while data warehouse vendors aren’t dead—they’re not even gasping for breath—they risk being left behind for modern data workloads if they don’t quickly embrace Hadoop and other 21st Century data infrastructure.

‘Extinguish’ Is Not An Option

One thing we learned from the rise of Firefox, Linux and other open-source software is that Microsoft’s strategy of “Embrace, Extend, Extinguish” has lost its potency.

See also: Hadoop 2.0 Makes Big Data Even More Accessible

Legacy data warehousing and database vendors have opted to embrace and extend Hadoop and NoSQL, with no apparent thought for the “extinguish” strategy. Given a robust enough community, open source is somewhat impervious to such “extinguish” strategies.

All of this means the legacy vendors will have to do more than find partners to remain relevant in the modern data landscape. Ultimately, open source influence is bought with contributions of open source code. We’re entering a whole new era of data infrastructure to support operational and analytical workloads for today, but also for the next 50 years. Those who want to participate are going to have to contribute code.

Facebook Comments