Thursday, July 1, 2010

Benefits of Transparency

I thought it would be helpful to consolidate a list of the primary benefits of web sites/services being transparent online. If there are any I missed, please leave a comment and I'll update the list:

Benefits of Transparency (for online websites and services)
1. Build trust with your users
2. Increase loyalty, reduce churn
3. Improve perception of your reliability
4. Reduce support costs
5. Control the message
6. Gain a competitive advantage
7. More time to focus on the actual problem
8. Reduce stress
9. Learn

See below for more detail...

1. Build trust with your users
Your users have a pretty low bar for how they expect to be treated. They basically expect you to screw them, hide information from them, and do the bare minimum to take their money. If you do something good for them, something unexpected like admit that you have problems proactively, and show your humanity, your users will develop a sense of trust for your service and your company. I believe that trust may be the most important asset you can earn on the web, especially if you deal things that are really important to your customers (e.g. money, email, photos, etc.).

Example: If the car company does a recall as soon as there is a hint of a problem, you trust them a lot more then if they are forced to do a recall after a number of deaths.

The more times you are proactive and admit to problems before you are caught, the stronger the sense of trust gets. If you are instead forced to admit your problems, or your customers complain before you tell them that you are aware of the problem, the harder it gets to convince them that you know what you are doing and that you care about the quality of the service.

2. Increase loyalty, reduce churn
Your users don't expect you to be perfect. They will forgive you when you have a problem. But only if they feel that they can trust you, that you know what you are doing, and that things are improving. Your users will stick with you if they feel like you know what you are doing, that you feel their pain, that you are taking these issues seriously. Apologizing and explaining after the fact is much more difficult. It is hard to convince your customers that you know what you are doing and that you care about their issues if you avoid the problem, or worse pretend that it doesn't exist.

Example: Atlassian's security breach a few months ago...they could have lost a lot of concerned customers questioning their is trustworthy. Instead they increased loyalty and trust by being up front about the situation, explaining what they are doing about it, and improving for the future. If instead the issue was exposed independently, they would have seen a mass exodus.

A major downtime event is innately going to lead to unhappy customers. You may as well try to turn it around into something worthwhile, and try to keep as many customers as you can. A nice side benefit is that the more your users learn to trust you, the more loyal and forgiving they become. It's a powerful loop that you want to get on the right side of.

3. Improve perception of your reliability
When users run into a problem with your service, whether it's their fault or yours, they'll often assume the wrong is on your end. If you instead show them exactly when you are actually having problems, and if you do this reliably and consistently, they'll know when you really have problems, and end up seeing that you aren't down as often as they thought. It's ironic that the more open you are about how often you have a problem, the less often your users will think you really are down.

Example: A complex web applications made up of many components, say using Google App Engine, the Foursquare API, and Google ads. You get alerted about a timeout issue...will you assume that Google is at fault or one of the other components. A quick visit to Google's public dashboard would show you that they are perfectly fine, and that the problem lies with one of the other services (which need their own public dashboards).

4. Reduce support costs
During a downtime incident your support department gets flooded with the same type of question..."I'm seeing a problem, what's going on?" and "Is the site down or is it just me?". If you can allow your customers to serve themselves, or make it easy for your support department to point complaints to a single succinct explanation, they can operate much more efficiently, and focus on higher level issues.

Also, a lot of times support doesn't even know what's going on during a downtime event, and having something to check themselves gives them more insight into the health of the system

Example: Amazon Web Services barely has support. They have a paid support service, and their forums, but otherwise there is very little real-time support. They can do this because they have a real-time public health dashboard that addresses 90% of the questions users are going to have in their day-to-day use of the service.

5. Control the message
If you don't tell your users what's going on during an event, they are going to speculate and assume the worst. They'll assume you aren't aware of the problem, that it'll last a long time, and that you're not taking it seriously. Even a simple update telling users that you are aware of the problem and are working on it gives them confidence that this isn't going to be the end of the company, and that you feel their pain.

Example: Users of Twitter experience on-and-off issues, but they can always tell how healthy the service is as a whole by visiting their public dashboard and status blog. They don't have to wonder how far-reaching the downtime is, or how long it'll last.

6. Gain a competitive advantage
All else being equal, when prospects are comparing your service to a competitor, especially when your service is critical to their own life/business, being able to tell a story about being transparent and open is a powerful differentiator. It gives your prospect a feeling of control, that they won't be left in the dark when the sh** hits the fan and their boss is breathing down their neck.

7. More time to focus on the actual problem
Especially for a small company, you can spend more time dealing with resolving the issue and less time fielding calls/emails. The better your process, the less you have to worry about beyond fixing the actual problem.

8. Reduce stress
With a defined process, ideally one that is procedural, you keeps people from freaking out and having to scramble at the worst possible time. The last thing you want to be doing during a downtime event is figuring out who can say what, and how to actually contact your entire customer base about a potential problem.

9. Learn
As noted by a comment by Heather Leson in the original post, disasters are an opportunity to help both customers and company staff share in the learning process. The more open you are about your issues, the more opportunity you'll have in both learning from your customers that may have had similar experiences, and the more your customers will learn from your experience. You aren't alone. Your customers have a vested interest in helping you succeed. You may be surprised by how forthcoming they are with advice and recommendations for your situation. Google App Engine ended up adding new features after a major downtime event, no doubt based on customer feedback. Amazon added their public health dashboard after one too many outages. As Heather put it, "Mutual success is one of the cornerstones of open source/open web organizations."

Wednesday, June 30, 2010

Quote in WSJ

Lenny Rachitsky, the head of research and development for the website monitoring company Webmetrics.com, said companies can take advantage of unexpected outages by communicating with customers about what is going on—something Amazon didn't do during the outage, beyond its note to sellers. "Customers don't expect you to be perfect, as long as they feel that they can trust you," he said. "All it takes is to give your users some sense of control."
A similar sentiment was posited by Eric Savitz over at Barrons:
So, here’s the thing: it seems to me that Amazon actually made a bad situation worse by failing to communicate the details of the situation with its customers. My little post Tuesday afternoon on the technical troubles triggered 149 comments, and counting. The company’s customers did not like having the site go down, and even more, they did not like being left in the dark. And so far, the company still has not come clean on what went wrong. Some of the people who commented on my previous post were worried that their personal data might have been compromised. I have no real reason to think that was the case, but it certainly seems odd to me that Amazon has taken what appear to be a defensive and closed-mouth stance on an issue so basic to its customers: the ability to simply use the site. Jeff Bezos, your customers deserve better.

Tuesday, June 29, 2010

Amazon.com goes down, good case study of consumer-facing transparency (or lack thereof)

One of the questions I received from the audience after my talk last week was about how B2C companies should handle downtime and transparency. Today we have a great case study, as Amazon.com was down/degraded for about three hours:
You often hear about Amazon Web Services having some downtime issues, but it’s rare to see Amazon.com itself have major issues. In fact, I can’t ever remember it happening the past couple of years. But that’s very much the case today as for the past couple of hours the service has been switching back and forth between being totally down and being up, but showing no products. (source)



The telling quote, and impression that appears to be prevalent across Twitter and other blogs that have picked up the story is this:
Obviously, Twitter is abuzz about this — though there’s no word from Amazon on Twitter yet about the downtime. Amazon Web Services, meanwhile, all seem to be a go, according to their dashboard. The mobile apps on the iPhone, iPad and Android devices are sort of working, but it doesn’t appear you can go to actual product pages.
Let's think about this from the perspective of the customer. They visit Amazon.com and see this:


They wonder what's going on. They question whether something is wrong with their computer. If they are technical enough they may visit the Amazon's Twitter account to see if there is anything going on (a whole lot of nothing):



Maybe the visitor is even more technical, and knows about the public health dashboard that Amazon offers for their AWS clients. Well, that again gives us the wrong impression (all green lights):



At this point the user is frustrated. She may hop on Twitter and search for something like "amazon down", which would show her that a lot of other people are also having the same problem. This would at least make her feel better. Otherwise she would be stuck, wondering what is going on, how long it'll last, and whether to try shopping someplace else.


It turns out that Amazon did in fact put out an update about what was going on...in the well hidden Amazon services seller forum:




Realistically, Amazon doesn't go down very often, and for most people this is more of an annoyance than anything. I don't see Amazon customers losing trust in Amazon as a result of his incident. As Jesse Robbins put it:

They key here is that now Amazon has a lot less room for error. One more major downtime like this, especially within the year, will begin to eat away at the trust that customers have built for the service. To be proactive in avoiding that problem, and to give themselves more room for error, I would strongly advise Amazon to do the following:
  1. Put some sort of communication out within 24 hours acknowledging the issues.
  2. Put out a detailed postmortem, explaining what happened, and what they are doing to improve for the future.
  3. Improve your process around updating the public about amazon.com downtime. The Twitter account is a good start, and it's very promising that you put out a communication to the public. The problem is that the places your users looked for updates they saw nothing, and the forum you posted to very few users would ever think to check. I would launch a new public health dashboard focused on overall Amazon.com health (and make sure to host this outside of your infrastructure!), which would include the AWS health as a subset (or a simply link), along with other increasingly important elements of your company: Kindle download health, shipping health, etc.
  4. Implement the improvements discussed in the postmortem.

Other takeaways
  1. I'm feeling that transparency in the B2C world is rarely as critical as in B2B relationships. There are certainly cases where consumers are just as inconvenienced and frustrated when their services are down, but in terms of impact and revenue loss, the bar has to be much higher for B2B businesses. I also believe that consumers are much more forgiving of downtime, and won't require as much from a company when they go down. This will change however as consumers become more dependent on the cloud for their everyday lives.
  2. Amazon set the bar high for their AWS transparency. Users of those services automatically checked the existing communication channels, which is what you would want. Unfortunately Amazon did not set up a process to connect those two parts of the company.
  3. This also exposed the problem with having different processes and tools for different parts of your organization. Ideally there would be a central place for status across the entire amazon.com property. It's understandable that AWS is doing things a bit differently, but the consequence as we saw was that users waste time looking at the wrong place. This is something Rackspace has trouble with as well.

Monday, June 28, 2010

Video of my talk (Upside of Downtime) at Velocity 2010

Video of my talk has been posted (below), though watching it and listening to myself feels pretty damn weird. I've been blown away by response I've gotten to this talk. I know of at handful of companies circulating these slides/notes internally and working to make their companies more transparent. I've personally heard from a number of people at the conference that were discussing the ideas with their coworkers thinking about the best approach to take action. Even Facebook (the example I used of how not to handle downtime) has found resonance with the talk, and pointed me to a little known status page.

I'm hoping to start a conversation around the framework and continue to evolve it. I'm going to expand on the ideas in this blog, so if there is anything specific you would like me to explore (e.g. hard ROI, B2C examples, cultural differences, etc), please let me know.

Enjoy the video:


The slides can be found here: http://www.slideshare.net/lennysan/the-upside-of-downtime-velocity-2010-4564992