Big Data challenges all of our assumptions about how data should be stored, processed and analyzed. But that doesn’t mean relational databases and other incumbent technologies are slouching toward obsolescence anytime soon.
That’s the view of Cloudera co-founder Mike Olson, who recently sat down with Bosch’s Dirk Slama to discuss the interplay between the Internet of Things and new data technologies like the distributed-processing framework Hadoop. Slama, who’s writing a book on the IoT boom, authors white papers and speaks regularly on the topic. As such, he was the perfect person to ask thoughtful questions of Olson and draw out some pretty insightful responses.
Thankfully, I got to listen in. Here are some of the highlights.
Big And Getting Bigger
While “Big Data” is often a misnomer—most enterprises struggle far more with kaleidoscope-esque data variety than mountainous data volumes—it’s absolutely the case that data volumes are increasing. Ninety percent of the world’s data was created in the last two years, according to IBM research.
Olson concurs:
[W]e are only seeing the very early days of IoT data flows, and already those data flows are almost overwhelming. Take the amount of information streaming up the smart grid, from taking readings once a month to 10 times a minute: That’s 150,000x more observations we are now getting per meter per month. Those data volumes are guaranteed to accelerate. We are going to collect more data at finer grain, and we are going to do it from a lot more devices in the future.
As Olson hints in that last response, the machines are to blame. He argues that “[t]he emergence of machine generated data has forced us to rethink how we capture, store and process data, and building very large-scale, highly parallel compute farms is now absolutely common.”
That “rethinking” is increasingly being done by a new generation of developers. While today there are just 300,000 developers contributing to IoT, a recent report from VisionMobile projects a whopping 4.5 million developers by 2020, reflecting a 57% compound annual growth rate and a massive market opportunity.
The Role Of Relational Databases
Will those developers still be using traditional relational databases to capture and process all that data? Yes and no.
Olson is quick to point out the ongoing relevance of relational databases:
If there was going to be a thousand times more data in the world than there is today—and that’s an easy number to believe—it stands to reason, that relational databases are going to continue to play a vibrant role in the market, by capturing and delivering business applications on a subset of that data.
But he’s equally quick to showcase an even bigger opportunity for modern data infrastructure like Hadoop:
The big opportunity for a new generation of database technology is not to go disrupt the existing OLTP or OLAP markets. It’s to unlock analytic power against new data flows, data that was never before available, to understand things about the world that we could never now before, because we did not have the information. So I don’t think this is doom and gloom for traditional databases. I think that a new market and a new opportunity in Big Data—driven substantially by IoT—creates huge opportunities for a new class of technologies.
Much of the data that enterprises consume as part of their Big Data projects is transactional in nature, and so very much the province of traditional databases. But that will continue to change as new types of data require new analytics.
No One-Size-Fits-All Solutions
All of which means that we’re in for a polyglot future, with enterprise data warehouses sitting side-by-side with Hadoop, even as NoSQL databases and their relational cousins commune together.
After all, Big Data is, well, big. By its very definition, it’s too vast and diverse for any one technology to completely master it all.
Still, Olson and others offering new data technologies argue that Hadoop’s data-handling volume and analytic flexibility mean that “you can just do stuff that wasn’t possible before,” thus unlocking new opportunities from all that data. It’s that new opportunity that has driven multi-billion dollar valuations for Cloudera and other startups, and has attracted serious product investments from Bosch and others.
Lead image of a Cubieboard Hadoop cluster courtesy of Wikimedia Commons