Transparent Uptime: Amazon.com goes down, good case study of consumer-facing transparency (or lack thereof)

One of the questions I received from the audience after my talk last week was about how B2C companies should handle downtime and transparency. Today we have a great case study, as Amazon.com was down/degraded for about three hours:

You often hear about Amazon Web Services having some downtime issues, but it’s rare to see Amazon.com itself have major issues. In fact, I can’t ever remember it happening the past couple of years. But that’s very much the case today as for the past couple of hours the service has been switching back and forth between being totally down and being up, but showing no products. (source)

The telling quote, and impression that appears to be prevalent across Twitter and other blogs that have picked up the story is this:

Obviously, Twitter is abuzz about this — though there’s no word from Amazon on Twitter yet about the downtime. Amazon Web Services, meanwhile, all seem to be a go, according to their dashboard. The mobile apps on the iPhone, iPad and Android devices are sort of working, but it doesn’t appear you can go to actual product pages.

Let's think about this from the perspective of the customer. They visit Amazon.com and see this:

They wonder what's going on. They question whether something is wrong with their computer. If they are technical enough they may visit the Amazon's Twitter account to see if there is anything going on (a whole lot of nothing):

Maybe the visitor is even more technical, and knows about the public health dashboard that Amazon offers for their AWS clients. Well, that again gives us the wrong impression (all green lights):

At this point the user is frustrated. She may hop on Twitter and search for something like "amazon down", which would show her that a lot of other people are also having the same problem. This would at least make her feel better. Otherwise she would be stuck, wondering what is going on, how long it'll last, and whether to try shopping someplace else.

It turns out that Amazon did in fact put out an update about what was going on...in the well hidden Amazon services seller forum:

Realistically, Amazon doesn't go down very often, and for most people this is more of an annoyance than anything. I don't see Amazon customers losing trust in Amazon as a result of his incident. As Jesse Robbins put it:

They key here is that now Amazon has a lot less room for error. One more major downtime like this, especially within the year, will begin to eat away at the trust that customers have built for the service. To be proactive in avoiding that problem, and to give themselves more room for error, I would strongly advise Amazon to do the following:

Put some sort of communication out within 24 hours acknowledging the issues.
Put out a detailed postmortem, explaining what happened, and what they are doing to improve for the future.
Improve your process around updating the public about amazon.com downtime. The Twitter account is a good start, and it's very promising that you put out a communication to the public. The problem is that the places your users looked for updates they saw nothing, and the forum you posted to very few users would ever think to check. I would launch a new public health dashboard focused on overall Amazon.com health (and make sure to host this outside of your infrastructure!), which would include the AWS health as a subset (or a simply link), along with other increasingly important elements of your company: Kindle download health, shipping health, etc.
Implement the improvements discussed in the postmortem.

Other takeaways

I'm feeling that transparency in the B2C world is rarely as critical as in B2B relationships. There are certainly cases where consumers are just as inconvenienced and frustrated when their services are down, but in terms of impact and revenue loss, the bar has to be much higher for B2B businesses. I also believe that consumers are much more forgiving of downtime, and won't require as much from a company when they go down. This will change however as consumers become more dependent on the cloud for their everyday lives.
Amazon set the bar high for their AWS transparency. Users of those services automatically checked the existing communication channels, which is what you would want. Unfortunately Amazon did not set up a process to connect those two parts of the company.
This also exposed the problem with having different processes and tools for different parts of your organization. Ideally there would be a central place for status across the entire amazon.com property. It's understandable that AWS is doing things a bit differently, but the consequence as we saw was that users waste time looking at the wrong place. This is something Rackspace has trouble with as well.

Transparent Uptime

Tuesday, June 29, 2010

Amazon.com goes down, good case study of consumer-facing transparency (or lack thereof)

8 comments:

About Me

Resources

Cloud Health Status Updates

Blog Archive

Disclaimer