Amazon’s Web hosting services suffered another outage this past weekend, this time in its European zone, thanks to power issues at a data center in Dublin, Ireland.
The outage, at first thought to have been caused by a lightning strike, may actually have been caused by a transformer failure, according to ESB Networks. Whatever happened, the data center’s primary and secondary power sources were both knocked out, resulting in downtime for several Websites using Amazon’s EC2 infrastructure. Apparently, power was lost to not only the data center’s main power source, but also the backup generator, which is rare.
The power outage at the Dublin data center was followed by yet another service outage in the eastern U.S. the following day, which affected several high-traffic websites for up to an hour.
To make matters worse, a bug in Amazon’s EBS (Elastic Block Store) software appears to have accidentally started deleting snapshots of customer data, the company confirmed on Monday. Ouch.
These recent issues haven’t had quite the severity of the AWS outage in April, which affected hundreds of websites, some of them quite large, for days. Even so, events like this have a tendency to spark debates over whether the cloud is stable enough for so many companies to trust it for their hosting needs.
Companies generally gravitate toward cloud computing because of the money and human resources it saves them compared to maintaining their own infrastructure onsite. But when mission-critical Web applications and sites go down or, worse, data is lost, some may call into question whether the cloud is worth it.
However, companies utilizing the cloud for hosting do have a few options when it comes to protecting themselves from the inevitable, occasional outage. In an article about what lessons may be learned from the recent Dublin power outage, Information Week explains:
“It’s still possible that having the ability to fail-over to a second availability zone within the data center would have saved a customer’s system. Availability zones within an Amazon data center typically have different sources of power and telecommunications, allowing one to fail and others to pick up parts of its load. But not everyone has signed up for fail-over service to a second zone, and Amazon spokesman Drew Herdener declined to say whether secondary zones remained available in Dublin after the primary zone outage.”
Of course, in the April AWS outage, multiple zones went down, which would have precluded such a precaution from working. A few prominent Amazon customers, including Netflix, managed to stay up during the April outage, thanks to the way their architecture is engineered. Smaller organizations may not have the resources and technical know-how to pull off the level of resiliency boasted by Netflix during that outage, but there are other options.
In reality, these incidents don’t so much call into question the validity of using the cloud, so much as the tendency for some businesses to rely exclusively on a single cloud provider without a thorough disaster recovery plan in place.
Should recent outages in cloud hosting services cause people to think twice before migrating to the cloud? Or is it just a matter or having a fool-proof backup plan in place?