Yesterday’s Gmail outage was not just another pain. Out-of-control code also brought down some Chrome browsers, hurting productivity for those affected. As the boundary between cloud services and native clients blur, enterprises cannot afford this kind of instability any longer.
Monday was not a fun time to be a Google user. For about 40 minutes yesterday afternoon (on the East Coast), Gmail service, as well as Google Drive, experienced scattered outages. Reports indicate that personal Gmail accounts may have been more affected than Google Apps accounts.
Curiously, at nearly the same time, many Google Chrome users started reporting that their browsers were doing full-on crashes – very much a problem, since their browser of choice was actually now killing all of their Internet-based tasks. It was also weird, because Chrome, like most current browsers, tries to sandbox individual tabs so that if a bad script gets loaded from a faulty page, that tab will hang and not the whole browser. To have the entire client crash was trouble.
The Whys And Wherefores
The two failures may have been unrelated, though it’s not clear. The Chrome browser issue was caused by a problem with the Google Sync service — the feature that enable Chrome users to sync their preferences and bookmarks across machines.
According to Chrome developer Tim Steele, the back-end Chrome Sync servers experienced problems because of a load-balancing configuration change in their quota management system, a change that turned out to be wrong and ended up forcing the Sync service to throttle itself back.
Steele’s comments seem to imply that it was not the Gmail outage that caused the Chrome browsers to die, but perhaps the Sync problem that spread out to other services, like Gmail.
“That change was to a core piece of infrastructure that many services at Google depend on. This means other services may have been affected at the same time, leading to the confounding original title of this bug [‘When Gmail is down, Chrome Sync crashes Chrome’],” Steele wrote. “Because of the quota service failure, Chrome Sync Servers reacted too conservatively by telling clients to throttle ‘all; data types, without accounting for the fact that not all client versions support all data types.”
Hence, crashing Chrome browsers and perhaps some downed Google services, too.
The Enterprise Impact
In the grand scheme, this was not a huge problem. Depending on where you were, the issue cleared in about half an hour, and not every Chrome browser tanked, since not every Chrome user has Sync activated.
But the implications of this event are a little alarming. Here we have a situation where a change was made to complex software in the cloud and it immediately rippled right out to users. That change appeared to touch other services. Even if Gmail were unaffected by Sync’s problem, there is still the disturbing issue of browsers going down.
It’s reasonable, if irksome, to expect a cloud service to go down once in a while. It’s part of working in the cloud. But to have a rich client installed natively on your machine crash too, thus preventing other work while waiting for the original service to restart?
To an enterprise IT shop, which should (and usually does) test the heck out of any software before releasing anything to production, this is the opposite of what should happen. Enterprise procurement and configuration is often months behind the consumer market precisely because they don’t want cutting-edge software in the building to torque their employee’s machines.
Cloud-based apps like Google and Office 365 can change that sense of security in a heartbeat.
It’s not feasible to advocate the cessation of cloud-based tech. The benefits of cloud computing are too big to ignore. But if Google wants to be a serious enterprise player, it must find a way to properly test its configuration changes before they go live.
Now that the company is charging for Google Apps for Business, that becomes even more paramount. When money’s involved, the stakes get a lot higher.
Image courtesy of Shutterstock.