Wednesday, February 11, 2009

Transparency User Story #1: Your service seems to be down. I'd like to know if it's down for anyone else or if it's just me.

Note: This post is the first in a series of at least a dozen posts where I attempt to drill into the transparency user stories described in an earlier post.

Let's assume that you've decided that you want to make your service or organization more transparent, specifically when it comes to it's uptime and performance. You've convinced your management team, you've got the engineering and marketing resources, and your rearing to go. You want to get something done and can't wait to make things happen. Stop right there. Do not pass go, do not collect $200. You first need to figure out what it is you're solving for. What problems (and opportunities) do you want to tackle in your drive for transparency?

Glad you asked. There are about a dozen user stories that I've listed in a previous post that describe the most common problems transparency can solve. In this post, I will dive into the first user story:
As an end user or customer, it looks to me like your service is down. I'd like to know if it's down for everyone or if it's just me.
Very straight forward, and very common. So common there are even a couple simple free web services out there that helps people figure this out. Let's assume for this exercise that your site is up the entire time.

Examples of the problem in action
  1. Your customer's Internet connection is down. He loads up www.yoursite.com. It cannot load. He thinks you are down, and calls your support department demanding service.
  2. Your customer's DNS servers are acting up, and are unable to resolve www.yoursite.com inside his network. He finds that google.com is loading fine, and is sure your site is down. He sends you an irate email.
  3. Your customer network routes are unstable, causing inconsistent connectivity to your site. He loads www.yourcompetitor.com fine, while www.yoursite.com fails. He Twitters all about it.
Why this hurts your business
  1. Unnecessary support calls
  2. Unnecessary support emails.
  3. Negative word of mouth that is completely unfounded.
How to solve this problem
  1. An offsite public health dashboard
  2. A known third party, such as this and this or eventually this
  3. A constant presence across social media (Twitter especially) watching for false reports
  4. Keeping a running blog noting any downtime events, which tells your users that unless something is posted, nothing is wrong. You must be diligent about posting when there is actually something wrong however.
  5. Share your real time performance with your large customers. Your customers may even want to be alerted when you go down.
Example solutions in the real world
  1. Many public health dashboards
  2. The QuickBooks team notifying users that their service was back up
  3. Sharing your monitoring data in real time with your serious customers
  4. Searching Twitter for outage discussion

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.