IBM's Watson Fails To Compute In A World Of Open-Source Hadoop

IBM’s Watson Fails To Compute In A World Of Open-Source Hadoop

IBM’s supercomputer Watson knows almost everything there is to know—except how to generate a lot of revenue, that is.

Watson has accounted for just $100 million in IBM’s revenue over the last three years, according to The Wall Street Journal, even though the company hopes Watson will bring in $1 billion annually by 2018. There are several reasons for such stunted growth, but the biggest may simply be that IBM is a luxury in a world of commoditized, open-source Big Data analytics.

Why pay millions for Watson when you can run Hadoop for free?

Teaching Watson To Fish

It’s not that simple, of course.

Most companies wouldn’t need to put up the roughly $3 million that the Jeopardy-winning Watson reportedly cost $3 million to build, with much of this cost stemming from IBM’s expensive Power 750 servers. The price of these servers has dropped since 2011—they retailed for $34,500 around that time—but they’re still considerably more costly for enterprises than a Hadoop cluster, for instance.

IBM salespeople may be able to find a way to pitch the expense as worth the price. As Steve Watt, chief architect within Red Hat’s CTO office, reminds us, Watson involves “a complex sale,” requiring “a field that properly understands it in order to sell it.”

Given enough time and money, IBM should be able to hire a sufficient field and find success with it. As the WSJ reported, IBM is in the process of setting up a Watson division of roughly 2,000 workers, the majority of which will be field salespeople and consultants attempting to do precisely what Watt suggests is needed.

Can Watson Iterate?

Watson feels like the kind of big-ticket, complex product that enterprises increasingly eschew. But iteration is critical, particularly in the area of Big Data. Any Big Data project that starts with a $1 million check is almost certainly doomed to fail, because the very nature of Big Data is about iterating toward the right queries for one’s data.

And given how virtually all essential Big Data infrastructure is open source, there’s no need to start a Big Data project with a Big Check to any vendor, no matter how “surprisingly affordable” you proclaim a technology to be. Nor should it start with a classified ad as you search for Big Data talent. Indeed, following Gartner’s advice, it’s far easier to train an existing employee on Big Data technologies than it is to teach them your business.

Big Data is all about asking the right questions, which requires business context, and then iterating on your project as you learn which data sources are valuable, and which questions yield real insights.

This isn’t how Watson works.

While fine for answering structured Jeopardy questions, Watson apparently struggles with real-world, messy data, as uncovered by the WSJ, which found that “Watson’s basic learning process requires IBM engineers to master the technicalities of a customer’s business—and translate those requirements into usable software. The process has been arduous.”

In other words, Watson is like hiring an expensive data scientist, except not nearly as thoughtful. Far better for the customers in question to learn Hadoop or other Big Data technologies and ask questions of the data themselves than to pay both for IBM’s expensive consultants and its Big Data technology, which happens to be Hadoop under the covers, anyway.

Where Are The Watson Developers?

Beyond price and business model, however, is Watson’s most glaring omission: developers.

As mentioned, most of the Big Data technology, from NoSQL databases to Hadoop and everything in between, is open source. Developers, tasked by their business with figuring out how to put data to work, download these technologies and start iterating toward a solution. But Watson, for all its technical marvels, largely ignores the very developers that could make it popular and useful.

As IDC analyst Matt Eastwood argues, “To succeed [IBM] need[s] to build a different kind of ecosystem.” In other words, IBM needs to embrace developers.

Redmonk analyst James Governor takes this even further. While arguing that no one should “write off” IBM’s Watson fortunes, given that IBM plays the long game with technology, Governor points out that IBM’s “top down” approach, with technology that was “initially designed explicitly to not run on a cloud” such that “developers can’t easily play [with it]” makes it hard to sell.

IBM is working on this now, but still has a ways to go.

As IBM moves forward, it needs to make Watson permeable to developers. It goes to great lengths to detail how developers could build a “Watson Jr.” of their own, using off-the-shelf Hadoop and other software. That’s a start. But it would be far more interesting to give developers access to “Watson Sr.” and let them build upon it, tweak it and extend it.

Otherwise developers are just going to keep rolling their own Hadoop clusters, with no need for IBM or Watson at all.

Image courtesy of Wikimedia Commons under the Creative Commons Attribution-Share Alike 3.0 Unported license.