Showing posts with label saas. Show all posts
Showing posts with label saas. Show all posts

Saturday, January 10, 2009

How transparency can help your business

When looking to gain the benefits of transparency (into your downtime and performance issues), you first need to understand the use cases (or more accurately, the user stories) that describe the problems that transparency can solve. It's easy to put something out there looking for the press and marketing benefits. It's a lot more challenging (and beneficial) to understand what transparency can do for your business, and then actually solve those problems.

Transparency user stories

As an end user/customer:
  1. Your service seems to be down. I'd like to know if it's down for anyone else or if it's just me.
  2. I know your service is down, and I want to know when it'll be back up.
  3. I want some kind of explanation of why you went down.
As a business customer using your service as part of my own service offering:
  1. Before betting my business on your service/platform, I need to know how reliable it has been.
  2. My own customers are reporting that my service is down, but everything looks fine on my end. I need to know if your service is down, and if so I need information to keep my customers up to date.
  3. I want to find which link in my ecosystem of external services is broken or slow right away.
  4. One of my customers reported a problem in the past, and I'd like to correlate it with hiccups your service may have had in the past.
  5. I need to know well in advance of any upcoming maintenance windows.
  6. I need to know well in advance if you plan to change any features that are critical to me, or if the performance of the service will change.
As a SaaS provider:
  1. I want my customers (and my prospects) to trust my service. I don't want my customers to lose that trust if I ever go down.
  2. My support department gets flooded with calls and emails during a downtime event.
  3. I want to understand what the uptime and performance of my services are at all times from around the world. Both for internal reasons, and to help my customers diagnose issues they are reporting.
  4. I want to differentiate from my competition based on reliability and customer support.
In the next post, I will dive into ways to attack each of these user stories. Stay tuned.

Monday, December 22, 2008

Comprehensive review of SaaS SLAs - A sad state of affairs

A recent story about the holes in Google's SLA got me wondering about the state of service level agreements in the SaaS space. The importance of SLA's in the enterprise online world are obvious. I'm sad to report that of the state of the union is not good. Of the handful of major SaaS players, most have no SLAs at all. Of those that do, the coverage is extremely loose, and the penalty for missing the SLAs is weak. To make my point, I've put together an exhaustive (yet pointedly short) list of the SLAs that do exist. I've extracted the key elements and removed the legal mumbo-jumbo (for easy consumption). Enjoy!

Comparing the SLAs of the major SaaS players

Google Apps:
  • What: "web interface will be operational and available for GMail, Google Calendar, Google Talk, Google Docs, and Google Sites"
  • Uptime guarantee: 99.9%
  • Time period: any calendar month
  • Penalty: 3, 7, or 15 days of service at no charge, depending on the monthly uptime percentage
  • Important caveats:
  1. "Downtime" means, for a domain, if there is more than a five percent user error rate. Downtime is measured based on server side error rate.
  2. "Downtime Period" means, for a domain, a period of ten consecutive minutes of Downtime. Intermittent Downtime for a period of less than ten minutes will not be counted towards any Downtime Periods.
Amazon S3:
  • What: Amazon Simple Storage Service
  • Uptime guarantee: 99.9%
  • Time period: "any monthly billing cycle"
  • Penalty: 10-25% of total charges paid by customer for a billing cycle, based on the monthly uptime percentage
  • Important caveats:
  1. “Error Rate” means: (i) the total number of internal server errors returned by Amazon S3 as error status “InternalError” or “ServiceUnavailable” divided by (ii) the total number of requests during that five minute period. We will calculate the Error Rate for each Amazon S3 account as a percentage for each five minute period in the monthly billing cycle. The calculation of the number of internal server errors will not include errors that arise directly or indirectly as a result of any of the Amazon S3 SLA Exclusions (as defined below).
  2. “Monthly Uptime Percentage” is calculated by subtracting from 100% the average of the Error Rates from each five minute period in the monthly billing cycle.
  3. "We will apply any Service Credits only against future Amazon S3 payments otherwise due from you""
Amazon EC2:
  • What: Amazon Elastic Compute Cloud service
  • Uptime guarantee: 99.95%
  • Time period: "the preceding 365 days from the date of an SLA claim"
  • Penalty: "a Service Credit equal to 10% of their bill for the Eligible Credit Period"
  • Important caveats:
  1. “Annual Uptime Percentage” is calculated by subtracting from 100% the percentage of 5 minute periods during the Service Year in which Amazon EC2 was in the state of “Region Unavailable.” If you have been using Amazon EC2 for less than 365 days, your Service Year is still the preceding 365 days but any days prior to your use of the service will be deemed to have had 100% Region Availability. Any downtime occurring prior to a successful Service Credit claim cannot be used for future claims. Annual Uptime Percentage measurements exclude downtime resulting directly or indirectly from any Amazon EC2 SLA Exclusion (defined below).
  2. “Unavailable” means that all of your running instances have no external connectivity during a five minute period and you are unable to launch replacement instances.
...that's it!

Notable
Exceptions (a.k.a. lack of an SLA)
  • Salesforce.com (are you serious??)
  • Google App Engine (youth will only be an excuse for so long)
  • Zoho
  • Quickbase
  • OpenDNS
  • OpenSRS
Conclusions
There's no question that for the enterprise market to get on board with SaaS in any meaningful way accountability is key. Public health dashboards are one piece of the puzzle. SLAs are the other. The longer we delay in demanding these from our key service providers (I'm looking at you Salesforce), the longer and more difficult the move into the cloud will end up being. The incentive in the short term for a not-so-major SaaS player should be to take the initiave and focus on building a strong sense of accountability and trust. As it begins to take business away from the more established (and less trustworthy) services, the bar will rise and customers will begin to demand these vital services from all of their providers. The day's of weak or non-existant SLAs for SaaS providers are numbered.

Disclaimer: If I've misrepresented anything above, or if your SaaS service has a strong SLA, please let us know in the comments. I really hope someone out there is working to raise the bar on this sad state.

Sunday, November 23, 2008

Transparency case study, courtesy of ylastic

ylastic (a company that provides tools to help manage AWS services) kept their users in the loop during an outage by communicating status updates over Twitter:


You can find the entire set of updates at ylastic's twitter page.

I keep coming back to the same question. Do your users know where to go during a downtime event? ylastic has their web site, their blog, their forums, and their twitter feed. As a user, how do I know where to look when I'm having a problem and want to know what's going on with the service (which is generally an emergency)? As the company, how do I keep users from clogging my support email box in spite of my efforts to get status updates out to the world? In this case it looks like the only place that had any information was the twitter feed. If users weren't aware it existed, both sides would be out of luck.

What every SaaS service needs is a clear central place, that their users can easily find, that provides real time updates on downtime or performance events. It's great that you're willing to communicate during the event, but if no one can find those updates, what's the point? Don't get me started on falling trees.

On another note, kudos to ylastic for their transparency on the following fronts:
  • Providing insight into their product roadmap. Very much what SaaS providers must do to build the trust relationship with their users (which is critical to the success of any online hosted application).
  • Their upcoming iPhone app that among other things gives you the AWS Service Health status on the go.
  • Simply giving status updates on Twitter.