Home 4 Lessons from the Biggest Internet Service Outages of 2011

4 Lessons from the Biggest Internet Service Outages of 2011

The recent three-day service outage of Research In Motion’s Blackberry email service caused a chill felt across the world. And I’m not just talking about the affected customers. The chill was also felt by practically every IT network service professional watching the headlines in mid October, who know that if this could happen to a company with as many resources as RIM, it can happen in their department too.

As we close down 2011, we can reflect on (and learn from) the numerous, high-profile outages that occurred: Bank of America in March; Amazon EC2, Verizon LTE and Yahoo! Mail and Microsoft in April; and then Apple and Microsoft in August. In analyzing these disasters, I’ve come up with four lessons to be learned – they’ll help protect your company’s reputation, technical integrity and customer satisfaction during technical crises.

Kevin Conklin is an executive at Prelert, which reduces the cost, frequency and duration of business critical application disruptions by as much as 90% by adding a layer of self-learning predictive IT analytics software to traditional monitoring solutions such as Microsoft SCOM and Wily Introscope. Prelert customers gain instant, often predictive identification and root cause analysis of problems while eliminating much of the need to define and maintain thresholds, rules, management templates and dashboards.

Lesson #1: Your company’s brand is on the line

IT systems are not just internal systems anymore. Most companies experienced their first painful lessons with the advent of web sites and ecommerce. But today, it seems that every company has a growing amount of exposure to potential service outages that result in many unhappy customers. This said, it’s critical that IT and line of business executives continue to get more aligned.

We must also realize that systems have a tendency to crash at inopportune times. Look at the Verizon LTE network outage in April. The company’s fastest network, the LTE network was unavailable for customers and LTE devices were unable to be activated. The crash happened just 24 hours before the latest 4G-LTE smartphone, the Samsung Droid Charge, was scheduled to launch. The outage delayed the launch by two weeks and no doubt had a significant impact on its sales and reputation.

Lesson #2: Be proactive

Given the potential losses of network service outages, one might think that IT execs are totally focused on preventing major outages. But in my experience, they’re not. The key issue that “prevents preventing” outages is the infrastructure and application monitoring systems in use today. Many were architected when a company’s IT environment could still be visualized on a couple of PowerPoint slides. Their designs were based on the idea that IT experts would define the performance thresholds, rules and exceptions necessary to identify unacceptable behavior. But today, the typical enterprise application infrastructure is so complex that it defies an IT organization’s ability to fully understand. The result, unforeseen outages that often take days to resolve.

Given the potential losses of network service outages, one might think that IT execs are totally focused on preventing major outages. But in my experience, they’re not.

These monitoring systems are still great for generating the data required to understand the systems behavior – just ask the operations center that receives tens of thousand of alerts a day. But the real challenge lies in making sense of the alerts, and taking the right action to resolve the inevitable issues quickly.

Lesson #3: When crisis strikes, communicate early and often

Face it – we live in a 24/7 world and your customers know when there is a problem. It’s best not to ignore it and hope they don’t notice.

When the Microsoft cloud crashed in September, they kept customers in the loop, promising updates at precise time increments. The official Windows Live Status site read, “We’re aware of a problem with Hotmail that’s affecting some people. We’re investigating and will provide an update by Sept. 9 11:30 p.m…”

RIM failed twice over – responding several hours into the crisis and providing little details to ease customer angst. “We understand the frustrations our customers are experiencing through the delays with the messaging an browsing…I’d like to take this opportunity to apologize unreservedly to all those people affected by this situation. We’re taking this situation extremely seriously and we’re doing everything we can to restore normal operation to our service, ” said CTO David Yach.

Don’t let your competitors be your customer’s solution to your outage crisis.

Although you can’t promise answers, providing scheduled updates will go along way with customers. And if you don’t acknowledge a problem, you can sure bet your customers will be tweeting about it.

Lesson #4: Make amends

RIM eventually offered users $100 in premium applications and in some cases, free technical support for a month. While the costs to support the offers are likely high, it is likely worth it… It’s also important to remember that if you don’t provide compensation for customer inconvenience, your competitor’s will. The day that Yahoo! Mail crashed in Aprill 2011, Microsoft wasted no time offering annoyed customers something to make them feel better. The official Hotmail account tweeted, “First 1k #ymail users to [email protected] and send feedback today get HM+ free for 1yr. SwitchToHotmail.com.”

Don’t let your competitors be your customer’s solution to your outage crisis.

With major companies like RIM, Amazon, Microsoft, Yahoo!, Bank of America, and many others all experiencing major network outages this past year, it’s time to realize that it isn’t a matter of if your IT department will someday face a crisis, but rather when your IT department will face a network crisis. Be prepared and have a plan for how your company will react and respond from both a technical and public relations perspective to minimize the aftermath.

As we look forward into 2012, we wish you smooth running networks and fast resolutions to the challenges coming your way.

Photo from Nasa.gov

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the tech industry for major developments, new product launches, AI breakthroughs, video game releases and other newsworthy events. Editors assign relevant stories to staff writers or freelance contributors with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Get the biggest tech headlines of the day delivered to your inbox

    By signing up, you agree to our Terms and Privacy Policy. Unsubscribe anytime.

    Tech News

    Explore the latest in tech with our Tech News. We cut through the noise for concise, relevant updates, keeping you informed about the rapidly evolving tech landscape with curated content that separates signal from noise.

    In-Depth Tech Stories

    Explore tech impact in In-Depth Stories. Narrative data journalism offers comprehensive analyses, revealing stories behind data. Understand industry trends for a deeper perspective on tech's intricate relationships with society.

    Expert Reviews

    Empower decisions with Expert Reviews, merging industry expertise and insightful analysis. Delve into tech intricacies, get the best deals, and stay ahead with our trustworthy guide to navigating the ever-changing tech market.