IT Disasters Rarely Involve Fire Or Hail. But Beware Disk Crashes

Disaster is a word with strong connotations, conjuring images of fire, flood, storms, earthquakes and... spilled cups of coffee.

That latter category might not strike you as the kind of disaster that will bring the 24-hour news vans to your door, but to a business that depends on the uptime of its systems, a jostled cup of joe or a faulty hard drive can be just as much of a disaster as the Biblical stuff.

According to a new survey of small to medium-sized businesses, it isn't flooding, tornadoes or hurricanes that are mostly responsible for IT downtime, but rather hardware failures - a full 55% of downtime incidents, in fact. IT recovery vendor Quorum tracked the trouble tickets of customers who used their service in the first quarter of 2013 to derive the data.

Disk failures were the number one hardware failure, says Quorum CEO Larry Lang. "More often than you think, it's a SAN failure," he adds, referring to "storage area networks" that are designed to keep data available even when a disk - or disks - crash.

Power supply problems are another big hardware issue. At times, they're complicated by cooling system malfunctions that in turn overheat power supplies and bring them down in a giant cascade of fail.

Or you can face something completely unexpected. A Quorum customer once had a problem when a neighbor's renovation work spewed gypsum dust onto server heat sinks, causing them to lose efficiency and overheat the system.

Delete All? OK...

Next on the list was human error, which made up 22% of incidents that caused downtime. But Lang suspects that this figure is actually on the low side. "Human nature being what it is, the actual human mistakes tend to be under-reported," he said. Accidental deletions, are common mistakes that don't get reported.

Software failures ranked next, coming in at 18% of downtime causes. These include updates that don't go well, many of which were probably untested before deployment. (That could also put these back in the human error column, too.)

The last category is the flashier stuff. But natural disasters only accounted for 5% of IT downtime incidents.

The Cost Of Goofs

Estimating the damage caused by IT downtime isn't always easy. Ball-parking the financial cost is straightforward - just take your company's annual revenue, divide that by number of business hours in a year (2080 in the U.S.), then multiply that number by the number of hours your systems are down.

But some times in the year are worse for downtime than others. If your accounting firm's servers go down in mid-June, it's likely not as stressful or painful to the bottom line as a similar failure the week before the April 15 tax deadline in the U.S.

Then there's the reputational effect. When an actual natural disaster strikes and you are offline, customers are more likely to cut you some slack until you get things up and running. But if they tune in on any given day and your computers are dark for what to them seems no apparent reason, they might be... unsatisfied. Some might take to Twitter, Yelp and other social outlets to broadcast their frustration, too.

Back Up And Prepare

IT managers need to prepare for the worst, but also must understand that the worst might not happen when Mother Nature drops by.

To be prepared for the disasters that occur in the chaos of our daily lives, Lang recommended that, at the very least, businesses need to back up as much as they can. It may take longer to restore than expected, but when the worst happens, your data will still be there and you can start the recovery process.

Lang also recommends testing and retesting backup and restore processes as often as possible. Software and hardware configurations can change often, so you need to make sure your recovery operations won't fail.

"In business, the longest distance in IT is the distance between ought to work and known to work," Lang said.

Image courtesy of Shutterstock