Saturday, August 16, 2008

Do You Trust the Cloud?

Quoting a lifehacker post with the same name:
"While web-based applications promise gigabytes of storage, anywhere-access, easy backup, and no software requirements beyond your browser to use them, becoming dependent on webapps can leave you high and dry when those services go out."
Referencing the recent downtime of Gmail, MobileMe, and Amazon S3 got me thinking...what does it take to actually "trust the cloud". What would give users confidence in choosing these services, and sticking with them through the inevitable issues? The simple answer is transparency!

How good are these specific services at providing transparency into their downtime? Let's review:

Gmail downtime (2 hours) on 8/11/08
The Gmail team kept users updated during the downtime using a Google Group thread, with surprisingly frequent updates and details (over the 2 hour downtime period). After the event was over and they were able to get their thoughts in order, they then posted a message on their Gmail blog. A big red flag however shows in the spike in Twitter posts and the huge spike in searches for "gmail down". We can tell that users are unsure where to go to get the official word on what's going on, which means that all of the work the Gmail team is putting into keeping users up to date falls on deaf ears. If you post an update and no one sees it, does your update exist?
Conclusion: Very good transparency, but needs some work on making known the forum they are using to spread information. Kudos for the rarity of downtime this services has experience in it's history (too easy to overlook).

MobileMe downtime (2 hours) on 8/11/08
With their handy dandy System Status Receny History page, the MobileMe team documented the downtime. However the only way users were able to know anything was wrong DURING the event was a big fail went attempting to use the service. As CNET reports, "the same thing happened in mid July with enough blowback to cause Apple to offer a 30-day extension to both fre trial and paying users." In a valiant yet fruitless effort to keep users in the know, Apple created the MobileMe Status blog, which as of now still has no news of the recent downtime. On the plus side they have created a MobileMe Mail Chat page for users to get personal support when issues arise. On the downside, according to one comment "even the support guy didn't know that the service outage was going on".
Conclusion: Unnacceptable job keeping users in the loop, passable job documenting the events after the fact, and far too many random glitches to make this OK. Let's hope they get their act together soon and open up about their issues (won't be holding my breath...this is Apple after all).

Amazon S3 downtime (6 hours) on 7/20/08
By far the more critical of these online services means they should be held to a higher standard. During the event, the AWS team posted outage messages
and their Service Health Dashboard clearly showed they were having issues. After the event a detailed explanation went up on their site.
Conclusion: There's a reason I have Amazon AWS in the "Transaparency Hall of Fame" (top right of this blog). They've been at this a while, and their users have forced them to make this process as transparent as possible. They could get better at giving specific details during the actual event, and 6 hours of downtime is no laughing matter, but they did a good job and they continue to set the bar for transparency in online services.

Yesterday CNET posted the "10 Worst Web glitches of 2008 (so far)", which includes the above events, among others. What does this tell us? Clearly downtime across the broad spectrum of online services, from Amazon to Netflix to Google, is not going away. We need to learn to live with unreliable online services. The long term success of these services will be determined by how users perceive the reliability of these services, contrasted with the advantages of building in the cloud. That perception of reliability requires complete and utter transparency in the goings on of that service, especially during downtime events. We still have a long ways to go until there we can really "trust the cloud".

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.