Apple has quietly—and not so quietly—been buying up Big Data companies over the past few years, most recently acquiring FoundationDB but in 2013 also purchasing Acunu, maker of a real-time analytics platform. The intent seems to be to purchase data infrastructure talent—and very particular talent at that.
Basically, Apple needed to get into NoSQL database technology in a bad way. These alternatives to traditional relational databases (long known as SQL systems) offer speed and flexibility that older-style databases can only dream of.
As former Wall Street analyst and NoSQL (MongoDB and now Aerospike) executive Peter Goldmacher declares, Apple’s interest in NoSQL translates into a need to handle “massive workloads in a cost-effective way.”
In a far-ranging interview, Goldmacher points to the need to rethink enterprise data and calls out Hadoop and NoSQL technologies as the foundational bedrock of any Big Data strategy.
ReadWrite: Apple bought FoundationDB, but uses quite a bit of Cassandra, MongoDB, Hbase, and Couchbase already. At least as measured by job postings, it’s not using FoundationDB (the product). Why do you think they opted to purchase FoundationDB, the company?
Goldmacher: Apple is first and foremost an extremely innovative company in everything they do. They have created both transitional (iPod) and transformational (iPad) technologies and this desire to always innovate permeates the fabric of its corporate culture.
If you look at the software products the company provides, like iTunes, iMessage, iAd, etc., all of these products operate at massive scale. If they were written on traditional relational database technologies, it’s not clear if a) they would work or b) they wouldn’t bankrupt the company given the scale at which these products operate and the cost of a traditional RDBMS license.
So Apple innovated and was a very early adopter of NoSQL. It is reasonable to wonder if Apple’s software products would have even been possible without NoSQL technologies.
And here we are almost a decade after these products were launched, and Apple is yet again taking advantage of new technology. While the existing NoSQL technology was up to the task, it was expensive because of the massive server farms required to support the scale and the people required to support the massive server farms.
FoundationDB offers a key value store database akin to what Apple was using with Cassandra, but it runs in memory, which means you can reduce your hardware by about a factor of 8-10x. Said another way, if the company was using 75,000 servers to support the workload as I’ve seen speculated in the press [and on the Cassandra project page], FoundationDB would enable them to get that down to 7,500 servers.
To your question why purchase FoundationDB, I think they loved the technology and figured that if they just bought the company, they’d have the talent in house to continue to innovate and enhance the product and thus their ability to continue to innovate on the product front.
[Asay note: It’s worth pointing out that not everyone agrees on the value of FoundationDB’s actual product today. As MongoDB executive Kelly Stirman highlights:@crcsmnky @mjasay that SQL layer was a mess and not ready for prime time. People not tech, methinks.
— Kelly Stirman (@kstirman) March 26, 2015
But we’ll let Goldmacher and Stirman duke this one out in another post.]
RW: You say that the initial wave of NoSQL players can’t handle “massive workloads in a cost-effective way.” What is it about multi-model databases like Aerospike and FoundationDB that gives them this ability?
PG: Foundation and Aerospike are Key Value store databases akin to Cassandra, but the secret sauce is that the data resides in flash and not on spinning disk. This creates significant performance advantages with the knock on effect of needing less hardware.
RW: You do realize, of course, that DataStax, MongoDB, and others have customers running at “massive scale,” right? DataStax has Netflix and other marquee customers at significant scale, as does MongoDB….
PG: Absolutely, but there’s massive scale and then there’s the cost of massive scale. If I can get similar performance at 1/10th of the cost and massive scale means I am spending $50M, why wouldn’t I take that cost down to $5M?
RW: Do you think Apple’s acquisition is a sign of things to come for NoSQL, generally? Are we about to enter a consolidation phase?
PG: I think Apple is one of a special class of companies like Google, LinkedIn and Facebook that are so cutting edge and so heavily reliant on data as an asset, they absolutely must own and innovate on the technology that supports the business.
So we may or may not be entering a phase of consolidation in the NoSQL world, but the buying rationale won’t be anything like Apple’s rationale for buying FoundationDB.
I can clearly see a world where traditional enterprise IT companies that don’t have a dog in the database fight buy NoSQL vendors to go after Oracle. In fact, EMC is already pretty far down this path.
At some point the Ciscos and Dells of the world have to step up and become players in the database space because we are seeing the database players getting into the hardware space. The stage was set a long time ago for consolidation and I believe this trend will continue.
RW: Let’s pick winners. If an enterprise were forced to use only two Big Data technologies, what should they be and why?
PG: Well, it feels like everything is Big Data technology these days…. Still, if I were running IT at a large company, I would be investing in Hadoop and NoSQL.
With Hadoop, you have the ability to dramatically and cost effectively expand the contents and thus value of your data warehouse which is extremely important. The more you can measure, the more you can improve.
And in the NoSQL world, you have two opportunities.
First, use MongoDB/DataStax/CouchDB to replace workloads that have historically run in Oracle even though they weren’t a great fit either because of cost or functionality limitations. For example, MongoDB enjoys a number of consistent use cases like content management systems, web catalogs and web sites. Oracle is overkill for that.
So those NoSQL players help you do old things better.
But if you want to do new and truly innovative things, you need enormous speed and scalability. This is the second opportunity.
One of the most common use cases for Aerospike is in the AdTech world. The AdTech players load an Aerospike database every morning with relatively static data created in Hadoop. This data is essentially a person’s profile based on their cookies as they click around the internet every day.
In a gross oversimplification, Peter is a 45-year old male that lives in the Bay Area and shops on all the bargain web sites. This data gets loaded into Aerospike and then Aerospike collects data all day about what Peter is clicking on that day.
Well, if Peter is clicking a bunch of web sites looking for a watch, Timex or the local watch store would bid aggressively for the opportunity to put an advertisement in front of Peter because he is exhibiting characteristics of a likely buyer. That is a great example of deriving tremendous value from your data warehouse by making the data actionable when it matters.
Photo by Ivan Bandura