We knew that last week’s major Skype outage was caused by a problem with “supernodes,” and now Skype CIO Lars Rabbe has provided more details about what caused the popular VOIP service to suffer such a widespread failure.
The culprit: a bug in a Windows version of the Skype client (version 5.0.0152). On December 22, a cluster of support servers became offline, and as a result, the buggy Windows clients crashed.
The bug didn’t impact other clients or older or newer versions, but according to Skype, about 50% of all users globally were running the affected version of Skype for Windows. And the crashes caused about 40% of these clients to fail. As these clients included about 25-30% of publicly available supernodes, the crashes were amplified. Then, as the peer-to-peer network tried to cope with the number of supernodes offline, failover mechanisms were triggered, which according to Rabbe “led to the near complete failures that occurred a few hours after the triggering event.”
Skype engineers were able to restore the network by introducing thousands of new, dedicated supernodes to the network. The supernodes were stabilized by Friday, says Skype, with service slowly returning to normal.
Skype says that it will work to prevent this sort of thing from happening again, particularly by making sure that users auto-update buggy software. Skype has also announced its plans to provide paying users with vouchers to compensate for the outage.