For all the optimism surrounding the potential of computing in the cloud – lower costs, better performance, easier scaling – it isn’t a perfect system. No matter how distributed and redundant the architecture or how rigorous the backup system, when it comes right down to it, there’s a complex series of hoops through which the data has to jump to travel between the user and where it actually resides on a piece of physical hardware. And when a segment of that process fails, all the benefits of the cloud suddenly seem all the less magical.
Take a recent unfortunate situation for Ylastic, a company that provides a single front-end to manage Amazon Web Services, who was recently an unwillingly participant in one of these cloud bursts.
Ylastic noticed something strange occurring with one of the Amazon Elastic Cloud Compute (EC2)Elastic Block Stores (EBS), a service that is “particularly suited for applications that require a database, file system, or access to raw block level storage.”
But something wasn’t quite right. And over the course of a few hours the story played out via Twitter as Ylastic noticed issues with its EBS instances.
When the problem was finally identified, Ylastic discovered that the data could not be recovered. They were forced to recover from an earlier snapshot, that contained only a subset of the data.
Finally, after recovering what data they could, Ylastic had to go to its customers with the unfortunate message:
“AWS has finally terminated the frozen instances. But the EBS volume is still detaching and has been for hours. It doesn’t seem like we will be able to get into it at this point. Some time in the last month or so, our EBS snapshotting of this stuck volume seems to have stopped working correctly…. We have gone back and run through all the snapshots, and the last good snapshot that we have is from October 1.”
Who was at fault? Amazon? Ylastic? Truly, no one. It was simply a combination of issues. A perfect storm in the cloud, as it were. And that perfect storm resulted in data loss for Ylastic and its customer base.
Does this mean we should run screaming from the potential the cloud holds? No, absolutely not. But it’s an unfortunate reminder that the system is far from perfect and that those who are relying on the cloud to serve critical aspects of their business should be ever diligent to ensure that the data is being backed up.
For all the technical magic of the cloud, it’s still the basics of data management that matter most.