What’s the next big opportunity for developers after mobile? Distributed systems may be the answer, according to Eric Frenkiel, the cofounder and CEO of MemSQL.
When Apple introduced its first version of iOS nearly eight years ago, it did more than introduce a smartphone operating system. That launch sparked a new platform for developers and a novel category of businesses built around mobile apps.
Today, distributed systems visionaries see yet another platform emerging, but this time it is happening at datacenter scale. Rather than building businesses around mobile apps, leading developers are building them around distributed systems.
Frenkiel’s MemSQL, a San Francisco-based startup, is one of the pioneers aiming to help enterprises manage big data quickly and accurately.
A former Facebook engineer, Frenkiel believes the future of databases is distributed—and that SQL, a specialized programming language familiar to many database engineers but discarded by many big-data initiatives, has a big role to play. I recently caught up with Eric to find out why.
ReadWrite: MemSQL’s founders came from the Facebook engineering team, ground zero for thinking about scale and performance. What problem did you see in the market that you wanted to address with a new distributed data store?
Frenkiel: I was an engineer working with Facebook partners to help them make use of the Facebook social graph. It was such a firehose of data that many partners couldn’t handle the incoming data stream.
I remember wondering, “If these early big data adopters can’t handle it, what about traditional companies?” One of the reasons we founded MemSQL was to help every company that wants to operationalize big data and become a real-time enterprise do so, without hiring armies of PhDs and writing their own software.
What was very controversial at the time we started the company, back in 2011, was our idea to marry the Facebook scale-out concept to SQL. The conventional wisdom then was to go with NoSQL for more performance and scale, and that SQL was perceived as a hold up.
RW: What problems does MemSQL solve better than NoSQL databases like MongoDB or Cassandra, traditional datastores, or other distributed datastores?
EF: A lot of companies with big data needs are struggling with Oracle, SQL Server, and legacy solutions.
NoSQL emerged as a new way to handle Big Data. MongoDB, for example, is great if you want to build an app. But it’s challenging when you want to scale. It’s even more challenging to do analytics. Cassandra has its advantages and disadvantages, too. It’s a wonderful database if you only want to store keys and values, but again, it proves challenging if you want to run analytics.
Both MongoDB and Cassandra jettisoned SQL for their reasons. But the world knows SQL. Now if you want to use Cassandra, you have to learn Cassandra’s proprietary query language. So in practice, people who use MongoDB or Cassandra probably don’t care that much about analytics.
At MemSQL, we developed a scalable solution that captures data down to the last click or transaction, and simultaneously runs SQL analytics on that data. Everyone in an enterprise is familiar with SQL.
RW: I see that you are partnering with Mesosphere and have MemSQL running on their datacenter operating system (DCOS). What does that mean?
EF: We share the same thinking about distributed systems as Mesosphere. We are ardent advocates that distributed systems are the next wave of computing.
Seven years ago you were considered a bit loopy if you were building applications for the iPhone. Back then, early developers who embraced iOS were placing big bets on a nascent platform and technology ethos. Now, there are thousands of companies building mobile apps and participation in the mobile app economy is considered mainstream.
Today we see Mesosphere’s DCOS as the modern platform that will spawn a whole generation of mainstream distributed systems. The distributed systems that run natively on Mesosphere’s datacenter operating system (DCOS) are very relevant to MemSQL, projects like Apache Spark, HDFS, Kafka and more.
We want to be the database of choice in a world of distributed apps and we see Mesos and Mesosphere as a way to make that happen.
RW: When I talk to you or the folks over at Mesosphere, I hear an almost religious zeal for distributed systems. Is there a big picture here?
EF: Yes, the big picture is the belief in distributed systems.
A lot of people in the tech industry talk about distributed as a means to scale. An equal or more powerful message is how to use distributed systems as a way to reduce complexity, increase resiliency, and contain costs more effectively.
When people go and say they want to build a data pipeline with distributed systems then they should deploy Mesosphere to manage those systems, Kafka for messaging, Spark for transformation, and MemSQL for the datastore tier. I call this the new real-time trinity stack.
Mesosphere is a way for organizations to build distributed systems easily. By partnering with Mesosphere, we get a leg up.
RW: But won’t Oracle just tell me they can do it, too?
EF: You can’t do this on Oracle profitably. You need ultracheap hardware that scales out to meet profit objectives as well as the data analysis requirements of the business.
RW: How does running MemSQL on DCOS translate to actual use?
EF: The power comes from that combination of the real-time data pipeline: Mesos plus Kafka plus Spark plus MemSQL.
For example, at Pinterest, how do they monitor repins as they happen in real time around the world? They have a lot of customers in retail, fashion, and media communications where relevance is determined by being appropriate to the moment.
Say there is a hailstorm in Chicago and some people are pinning photos of that. Well, outdoor clothing brands like REI might want to participate in real time during that hailstorm, not an hour or a day later.
Another use case is in financial services. People are tracking market trades and running queries and analytics up to the last trade. Conventional systems have streaming data come in, then they require ETL to capture and harmonize the data stream and then they run analytics. Well, by then the data insights may already be out of date. Hedge funds need insights in real time.
A final example is media. Look at a giant data-streaming company like Comcast. They measure their network health absolutely in terms of video streams. If latency is increasing in different areas they need to add capacity and rebalance to keep delivering their customers a great viewing experience and keep complaints down.
Lead photo by Bob Mical