MapR CEO: Hadoop Will Be Less About NoSQL, More About Parity

Last month, veteran IDC analyst Dan Vesset predicted that while Hadoop will become a standard component of the modern data center, by 2015 the market around Hadoop will have matured at such a rate that the major players we recognize today probably would no longer exist. MapR – a commercial Hadoop provider whose name was inspired by the MapReduce programming model for Hadoop – was one of the companies on Vesset’s target list for acquisition, and perhaps a ceremonial asterisk for history once Wikipedia emerges from blackout.

So you might expect the predictions of MapR CEO John Schroeder for the year 2012 would not include obscurity for his own company. But Schroeder makes at least an arguable case: The difference, he says, between the database market in 2012 versus the one from 1992 has to do with the customer’s preference to refrain from vendor lock-in, and that customer’s newfound ability to ensure against it.

The portability play

“Multiple vendors competing in the marketplace brings out the best,” Schoeder tells ReadWriteWeb. “If you look at the early ’90s, with Oracle, Sybase, and Informix slugging it out for building a world-class relational database engine, it was all based on ANSI-standard SQL. I’d argue that Hadoop interfaces are even more standard and portable than the interfaces were across those relational databases, because those vendors had [their own proprietary] extensions. There’s more to the platform than just the programming language of SQL.”

By way of a strategic partnership with EMC, MapR has quickly evolved into a first-order player in this new market. This partnership, Schroeder implies, could help serve as MapR’s insurance policy against oblivion.

But more importantly, he believes, Hadoop’s APIs are strictly standardized, so that more components of the platform are portable than for an RDBMS. “Customers could move between distributions fairly easily with fairly low switching costs,” he tells us. And future innovations in the emerging big data market, he believes, can and will only happen so long as the other players in MapR’s category – most prominently Cloudera and Hortonworks – work in cooperation with MapR to maintain that platform portability, and ensure their mutual plurality.

“I think having multiple vendors in the space advances the technology,” the CEO remarks. This way, if some developers write an application using HBase as the interface, others use Hive, and others use Pig, while still more choose to stick with the basic MapReduce API, the application itself is still portable between the various distributions.

The beta test phase is over

Schroeder perceives Hadoop implementations in enterprises as moving past the experimental, embryonic phase, and finally entering the mission-critical stage. But isn’t the fact that mission-critical applications started using data sets that were too huge for SQL relational engines, the trigger that sparked Hadoop in the first place?

“In cases where you’ve got very large, unstructured data sets that are not feasible for being processed using traditional data warehouses, companies will move forward with these implementations,” MapR’s Schroeder admits to believing. “They have applications that they wouldn’t have been able to implement before, so they could be critical to their business. But the state of the Hadoop distributions a couple of years ago really wasn’t a reliable compute and data store. Just eighteen months ago, if you put data in Hadoop, it was subject to data loss; and if you were running production applications, you would encounter cluster crashes. The distributions hadn’t matured enough to be reliable compute and data stores. That limited the applications to being more experimental, and less business critical.”

That’s changing, he continues, as the commercial Hadoop providers implement the same class of features customers expect from their SQL engines, such as business continuity and data protection.

Is SaaS a threat or a blessing?

As cloud service providers find new and more clever ways to provide database services through the cloud (Amazon’s Elastic MapReduce and DynamoDB, the latter just announced today, being two examples), some believe that small and medium businesses will sign on to cloud service providers for remote big data storage and management, rather than implement their own deployments on-premise. Could this possibly threaten the status of the new, commercial on-premise brands like MapR?

No, not so long as MapR gets a chance to be the engine inside these brands. One example John Schroeder provided was a defense contractor that resells its own implementation of MapR as a turnkey app for companies doing business with, or at, the Pentagon. Maybe those customers don’t recognize Hadoop as the engine, but who cares? Perhaps IDC’s Vesset was partly right in that the brands could fade into obscurity, but the companies behind those brands’ shared technology at least have one formula for continued survival.

To enhance, not replace

Early on, the future success of the so-called “NoSQL” movement was predicted on the basis of how soon unstructured data models could take over the enterprise. Now MapR CEO John Schroeder believes that success for Hadoop and big data systems depends on how soon software developers like his own take full advantage of the new class of applications beyond the maximum reach of SQL scalability.

“From working in this market for over two-and-a-half years, there isn’t much evangelism required. There’s a pretty strong market pull right now, and the integrators see that market pull, so they have to integrate that in. That said, I don’t see customers initially unplugging their data warehouses and replacing them with Hadoop. They augment.”

One example Schroeder provided was a credit card company working to implement fraud detection functionality. A traditional SQL data warehouse is more than likely already in place, and it may work well enough but without enough granularity for an analysis system to accurately capture or isolate the sequence of events that may lead up to a fraud incident. So one smart strategy he suggested was for that same warehouse to begin storing a supplemental stream of raw transactional data, perhaps several years’ worth, through Hadoop. That way, when a potential fraud incident is isolated using SQL, rapid analytics over billions of transactions may become available through Hadoop. From those analytics, a model for predicting future fraud events can be constructed that benefits both SQL and Hadoop engines.

“I think [enterprises] are introducing the Hadoop framework as a way to augment their data warehouses; and I think in the future, there’ll be much greater growth in the unstructured world than in the structured world. Why would you flatten and summarize data if you could keep the raw, transactional, log data online? You’re limiting the types of analytics you can do when you summarize, structure, and flatten the data.”