One of the biggest mysteries in computing is how Google achieves such massive scale in its operations so efficiently. Many would argue that Google’s leading competitive advantage today is how well it runs its datacenters, allowing it to create and iterate innovative services at a pace other vendors can’t match.
Google’s secret sauce is a software layer originally code-named BORG that orchestrates running applications across the company’s global datacenters. Rather than assembling separate clusters of servers for each service—one for Google Search, one for Gmail, one for Google Maps—Google shares the workload of all these services across all of its datacenter resources.
This work is divided into tiny tasks, and BORG sends these tasks wherever it can find free computing resources, such as processing power or computer memory or storage space, much like an operating system for the datacenter.
Mesosphere, a startup that today announced a $10 million Series A financing round led by Andreessen Horowitz, thinks it has the answer for companies that want to scale like Google but don’t have armies of internal engineers to develop their own BORGs. Mesosphere’s solution is based on open source Apache Mesos. Architecturally it shares Google’s view that the most efficiently run datacenters (and clouds) pool distributed datacenter resources.
I spoke with Mesophere CEO and co-founder Florian Leibert (@flo) about their plans for world domination, or at least giving the rest of us the chance to keep up with the Joneses (and the Googles).
Enabling A Generation Of Googles
ReadWrite: So what is the basic problem you are trying to solve at Mesosphere?
Florian Leibert: We believe that scale is broken in datacenters today and even the cloud. We fix that. We fix the scale problem in the same manner as Google, with what it called BORG.
Google’s secret sauce software pools all of its global data center resources so they get the most out of their hardware and make it easy to create and roll out new services to customers and at the same time developers never have to worry about scaling issues. We do the same thing using an open-source technology we helped develop called Apache Mesos. It’s what Twitter used to finally rid itself of the fail whale.
We can show you Twitter scale in 30 minutes!
Nor are they alone. Hubspot runs on Mesos on Amazon’s AWS. We help AirBnb, Ebay, Netflix, OpenTable, banks and others all to scale with a new and simpler approach to running their datacenters and clouds. We let you treat your datacenter like a single machine.
Virtualization Is Not The Answer
RW: I thought virtualization was supposed to solve this problem?
FL: A decade ago virtualization really helped accelerate server consolidation in the datacenter and saved companies a lot of money in their capex [capital expenditure] budgets.
But we think times, and the nature of applications, have changed profoundly since the advent of VMs. Servers have followed Moore’s Law, doubling in power roughly every 18 months. Virtual machines allowed you to place—manually—multiple small applications on these increasingly bigger servers.
Today’s applications are being written from the start as distributed systems where VMs don’t make as much sense. Rather than splitting up the applications onto multiple machines, we aggregate all the machines and present them to the application as a single pool of resources. It changes the way new applications are written, deployed and scaled and how existing applications are run, versus running multiple VMs. Our system aggregates your hardware into one pool. It doesn’t just try to scale existing hardware.
RW: What are the main advantages of this approach?
FL: There are important operations and business benefits. First of all, even if you are running a more traditional VM-based datacenter operation, adding Mesos can help you increase hardware utilization by a factor of 3X to 5X. Google will tell you privately that squeezing more out of their existing datacenters saved them having to build an additional giant data center.
If you are running your business in a public cloud, we can help, too. Our customer Hubspot runs their infrastructure on AWS and after switching to Mesos they slashed their Amazon bill in half.
Google Efficiency Without Rewriting Your Code
RW: But wait, if a company moves its workloads over to Mesos, doesn’t it have to rewrite all of its applications?
FL: No, we built an open source orchestration layer called Marathon. Marathon provides a REST API for starting, stopping, and scaling applications. The state of running tasks gets stored in the Mesos state abstraction. That means as a developer, if you’re running Linux applications on Mesos, you don’t need to make a single change in your applications. Think of running your Ruby on Rails application or your enterprise app server such as JBoss and have it recover and scale automatically.
The Future Running In Containers
RW: Where is this technology headed? How is it changing the nature of applications?
FL: I think as an industry we’re slowly moving from extremely statically defined things to pools of resources where things are still statically defined (via the PaaS defining it, and the container registry). But the next evolution is letting someone programmatically decide how it runs its containers.
Think about Web applications and just other apps that have been built in the past, where people want to spin up more threads, or allocate more memory, and then want to delete that memory or remove threads—and the app has a lifecycle. That’s what Mesos can provide you.
Lead image by Flickr user Robert Scoble, CC 2.0