The WordPress.com network went down last week. In the wake of the outage, we started looking at what infrastructure WordPress.com uses to serve its 10 million blogs.
WordPress.com is run from data centers in Chicago and San Antonio. Layered Technologies (LayeredTech) manages most of the WordPress.com infrastructure. According to LayeredTech materials, in 2005 WordPress.com had five servers. Today, LayeredTech manages about 1,000 servers for WordPress.com.
WordPress.com founder Matt Mullenweg said in a blog post that a routing issue in the data center caused the outage.
The cloud gets blamed for almost any online outage these days. It used to be that we’d just say the service went down and there was a failure at the host or the data center.
Sure enough, the WordPress.com outage is not a cloud disaster. Instead, it’s what happens when failover does not work in a data center. According to Mullenweg, that’s what happened at WordPress.com:
“There was a latent misconfiguration, specifically a cable plugged someplace it shouldn’t have been, from a few months ago. Something called the spanning tree protocol kicked in and started trying to route all of our private network traffic to a public network over a link that was much too small and slow to handle even 10% of our traffic which caused high packet loss. This ‘sort of working’ state was much worse than if it had just gone down and confused our systems team and our failsafe systems. It is not clear yet why the misconfiguration bit us yesterday and not earlier.”
We took a long look at the issue and still it bugged us that it is so vague about the differences between a data center network and a cloud computing environment.
Well, Mullenweg is clear. We asked if WordPress.com is hosted through a traditional data center or if it is on a grid, which would qualify it in some respects as a cloud computing environment. He took our question and shot it right back. And in many ways, we think he is right:
“That’s a silly question, like asking whether Facebook is a cloud computing environment,” Mullenweg said. Most ‘clouds’ besides Amazon’s are just marketing BS. WordPress.com is a collection of many physical servers across multiple datacenters to create a scalable, resilient environment for our customers. You could call it a grid, or cloud, we just call it service.”
Cloud computing is over-hyped. No doubt. It’s interesting to hear this from someone like Mullenweg, who knows first hand the challenges of scaling. Who cares if it is cloud or not. It’s just cheaper to do it yourself at some point.
Here’s how Mullenweg answered the rest of the questions we posed:
RWW: WordPress.com uses traditional hosting. Why not use the cloud or does part of your service rely on a public cloud service?
MM: We use cloud services where appropriate but always have a fall-back to local services. The best we’ve used is Amazon S3.
RWW: I saw you present at Microsoft PDC about Windows Azure. Are you using Windows Azure? How?
MM: We aren’t.
RWW: Are you planning to move more of your service to a public cloud environment? Why? Why not?
MM: No, in fact we’re going the other direction so we can have more control and lower our costs.
RWW: But if it is a cloud, how could one router take it down? I saw your explanation but a clarification would be most helpful.
MM: We have dozens of hardware and networking issues each week, and our system adapts and works around them so they’re invisible to users. This particular failure broke all of that. For an explanation of why, please check out my blog post on en.blog.wordpress.com.
So, really what is the difference? There may be very little difference at all. The only distinction being that cloud computing remains difficult to define and sometimes is just marketing speak.