Why Netflix' Christmas Eve Crash Was Its Own Fault

After an ill-timed outage on Christmas Eve zapped popular video provider Netflix - the popular refrain has been clear: Blame the cloud. But when there's a car crash, do we blame the highway or the humans driving the vehicles? Is Netflix really the victim here, or did it drive off the road all on its own?

Reports of Netflix crashing again on Christmas Eve day started trickling in about an hour after my youngest daughter, stuck inside on a bitter cold Minnesota day, complained that the service wasn't working on my iPad. That problem was alleviated by slapping a password on the device and sending her into the kitchen to help the rest of the family prep for dinner like she should have been doing in the first place. But the inconvenient timing of the outage was enough to cause a bump of coverage on the national news.

As the postmortems came though, it appeared that - once again - Netflix's problem lay within the cloud on which the service is hosted: Amazon Web Service's Ashburn, VA, data center.

Virginia? Again?

Neither AWS or Netflix have released a detailed report on what actually happened, but reports indicated that it was the elastic load balancers at the Virginia data center that somehow dropped the ball and led to significant traffic loss for Netflix viewers trying to watch their favorite Christmas movies. The service was back up by Christmas Day, but dropping the ball on Christmas Eve didn't make Netflix many friends.

Meanwhile, as many people noted during the Netflix outage, Amazon's own Instant Video service had no reports of problems. That raised a few eyebrows for customers wondering how Amazon managed to keep its own service going while its competitor was kaput.

No one is accusing Amazon's business units of collaborating to bring down Netflix. But the very fact that Netflix relies on a competitor's infrastructure to deliver its services seems to generate a conflict of interest.

A lot of those same industry observers are also calling for Netflix to get the hell off of Amazon's cloud. This is not the first time, after all, that AWS problems have smacked around Netflix and other popular Web services, and that Virginia data center specifically seems to be cursed.

I think a service like Netflix (of which I am obviously a customer) should keep its destiny in its own hands. But if you think that moving to it's own cloud will be the sure-fire cure-all for Netflix' reliability issues, think again.

The fault for the Netflix outage, the company would like us to believe, lies solely with AWS. But does it really?

Or does the problem lie with misuse of AWS tools? If the elastic load balancers were indeed the reason for the Christmas Eve outage, who was ultimately responsible for configuring those balancers?

Winning The Blame Game?

The highway analogy applies here, too. AWS is the highway, a shimmering ribbon of concrete, on-ramps and bridges that enable cars to get from point A to point B. Most of the time, the highway's operations run smoothly. But when someone misuses the highway, chaos will most certainly ensue - no matter how good the infrastructure is. 

If you don't like the highway example, pick another brand of infrastructure, like a building or a ship or a bridge. It's all the same: Use the infrastructure the wrong way, and bad things happen.

Netflix would (and can) argue that sometimes, no matter how well you're operating within the infrastructure, that infrastructure can break. That's true. Tragically, things fall apart and people and businesses can get caught in the wreckage. Such is life in an entropic universe.

But even if AWS has a faulty infrastructure, doesn't Netflix still have ultimate responsibility to create the solution? After all, customers are "renting" their movies from Netflix, not Amazon. And as pointed out, this is not the first time there's been problems at this particular data center. Why, after getting slapped off the Internet this summer, didn't Netflix make sure such an occurrence would happen again?

Netflix shares were down slightly on Thursday (about 1% as trading drew to a close). Maybe some shareholders are asking themselves why Netflix hasn't done more to shore up te reliability of its service. I know this customer is.

Image courtesy of Shutterstock.