Wednesday, June 30, 2010

Quote in WSJ

Lenny Rachitsky, the head of research and development for the website monitoring company, said companies can take advantage of unexpected outages by communicating with customers about what is going on—something Amazon didn't do during the outage, beyond its note to sellers. "Customers don't expect you to be perfect, as long as they feel that they can trust you," he said. "All it takes is to give your users some sense of control."
A similar sentiment was posited by Eric Savitz over at Barrons:
So, here’s the thing: it seems to me that Amazon actually made a bad situation worse by failing to communicate the details of the situation with its customers. My little post Tuesday afternoon on the technical troubles triggered 149 comments, and counting. The company’s customers did not like having the site go down, and even more, they did not like being left in the dark. And so far, the company still has not come clean on what went wrong. Some of the people who commented on my previous post were worried that their personal data might have been compromised. I have no real reason to think that was the case, but it certainly seems odd to me that Amazon has taken what appear to be a defensive and closed-mouth stance on an issue so basic to its customers: the ability to simply use the site. Jeff Bezos, your customers deserve better.

Tuesday, June 29, 2010 goes down, good case study of consumer-facing transparency (or lack thereof)

One of the questions I received from the audience after my talk last week was about how B2C companies should handle downtime and transparency. Today we have a great case study, as was down/degraded for about three hours:
You often hear about Amazon Web Services having some downtime issues, but it’s rare to see itself have major issues. In fact, I can’t ever remember it happening the past couple of years. But that’s very much the case today as for the past couple of hours the service has been switching back and forth between being totally down and being up, but showing no products. (source)

The telling quote, and impression that appears to be prevalent across Twitter and other blogs that have picked up the story is this:
Obviously, Twitter is abuzz about this — though there’s no word from Amazon on Twitter yet about the downtime. Amazon Web Services, meanwhile, all seem to be a go, according to their dashboard. The mobile apps on the iPhone, iPad and Android devices are sort of working, but it doesn’t appear you can go to actual product pages.
Let's think about this from the perspective of the customer. They visit and see this:

They wonder what's going on. They question whether something is wrong with their computer. If they are technical enough they may visit the Amazon's Twitter account to see if there is anything going on (a whole lot of nothing):

Maybe the visitor is even more technical, and knows about the public health dashboard that Amazon offers for their AWS clients. Well, that again gives us the wrong impression (all green lights):

At this point the user is frustrated. She may hop on Twitter and search for something like "amazon down", which would show her that a lot of other people are also having the same problem. This would at least make her feel better. Otherwise she would be stuck, wondering what is going on, how long it'll last, and whether to try shopping someplace else.

It turns out that Amazon did in fact put out an update about what was going the well hidden Amazon services seller forum:

Realistically, Amazon doesn't go down very often, and for most people this is more of an annoyance than anything. I don't see Amazon customers losing trust in Amazon as a result of his incident. As Jesse Robbins put it:

They key here is that now Amazon has a lot less room for error. One more major downtime like this, especially within the year, will begin to eat away at the trust that customers have built for the service. To be proactive in avoiding that problem, and to give themselves more room for error, I would strongly advise Amazon to do the following:
  1. Put some sort of communication out within 24 hours acknowledging the issues.
  2. Put out a detailed postmortem, explaining what happened, and what they are doing to improve for the future.
  3. Improve your process around updating the public about downtime. The Twitter account is a good start, and it's very promising that you put out a communication to the public. The problem is that the places your users looked for updates they saw nothing, and the forum you posted to very few users would ever think to check. I would launch a new public health dashboard focused on overall health (and make sure to host this outside of your infrastructure!), which would include the AWS health as a subset (or a simply link), along with other increasingly important elements of your company: Kindle download health, shipping health, etc.
  4. Implement the improvements discussed in the postmortem.

Other takeaways
  1. I'm feeling that transparency in the B2C world is rarely as critical as in B2B relationships. There are certainly cases where consumers are just as inconvenienced and frustrated when their services are down, but in terms of impact and revenue loss, the bar has to be much higher for B2B businesses. I also believe that consumers are much more forgiving of downtime, and won't require as much from a company when they go down. This will change however as consumers become more dependent on the cloud for their everyday lives.
  2. Amazon set the bar high for their AWS transparency. Users of those services automatically checked the existing communication channels, which is what you would want. Unfortunately Amazon did not set up a process to connect those two parts of the company.
  3. This also exposed the problem with having different processes and tools for different parts of your organization. Ideally there would be a central place for status across the entire property. It's understandable that AWS is doing things a bit differently, but the consequence as we saw was that users waste time looking at the wrong place. This is something Rackspace has trouble with as well.

Monday, June 28, 2010

Video of my talk (Upside of Downtime) at Velocity 2010

Video of my talk has been posted (below), though watching it and listening to myself feels pretty damn weird. I've been blown away by response I've gotten to this talk. I know of at handful of companies circulating these slides/notes internally and working to make their companies more transparent. I've personally heard from a number of people at the conference that were discussing the ideas with their coworkers thinking about the best approach to take action. Even Facebook (the example I used of how not to handle downtime) has found resonance with the talk, and pointed me to a little known status page.

I'm hoping to start a conversation around the framework and continue to evolve it. I'm going to expand on the ideas in this blog, so if there is anything specific you would like me to explore (e.g. hard ROI, B2C examples, cultural differences, etc), please let me know.

Enjoy the video:

The slides can be found here:

Wednesday, June 23, 2010

The Upside of Downtime (Velocity 2010)

Here is the full deck from my talk at Velocity, including two bonus sections at the end:
The Upside of Downtime (Velocity 2010)

Also, here is the "Upside of Downtime Framework" cheat-sheet (click through to download):

Monday, June 21, 2010

See you at Velocity 2010!

Tonight I leave for the (sold out) O'Reilly Velocity conference in Santa Clara, CA. I'll be presenting "The Upside of Downtime: How to Turn a Disaster into an Opportunity" on Wednesday at 4:35pm. If you're a reader of this blog and are at the conference, I'd love to meet up! Tweet me @lennysan or simply leave a comment here.

As soon as my talk ends, I will be posting the full slide-deck right here on this blog. Stay tuned!

P.S. If you're reading this post during my talk, here are some of the links I may or may not be referencing:

Thursday, June 10, 2010

Quick update (and Velocity preview)

Alas this blog has been quite for too long. My pathetic excuse is that I'm channeling the efforts that would normally go to this blog into my upcoming talk at Velocity. To make up for my negligence, here is a sneak peek at the talk:

I will post the entire slide deck here on the blog immediately following the talk. Stay tuned!