Saturday, November 15, 2008

Comparing Amazon Web Services, SalesForce, and Zoho's online health dashboards

Now that there are three major SaaS players offering online service health dashboards, and one from Google on it's way, I thought it would be a useful exercise to compare the offerings from Amazon Web Services, Salesforce, and Zoho. This will hopefully be helpful for anyone planning to launch their own health dashboard, and to the general online community in making sense of what is important to understand about these dashboards.

Disclaimer: If I have mistakenly misrepresented anything, or if I missed any information, PLEASE let me know in the comments below.

What providers are we looking about today?
What is the URL of each status page (and are they easy to remember in times of need)?
What are these status pages called?
  • Amazon Web Services: "AWS Service Health Dashboard"
  • Salesforce: "Trust.salesforce.com - System Status" (Note: salesforce.com goes beyond simply providing system status by also providing security notices, both under their "Trust.salesforce.com brand")
  • Zoho: "Zoho Service Health Status"
What services' health are reported on?
  • Amazon Web Services: All four core services (EC2, S3, SQS, SimpleDB), plus Mechanical Turk and FlexPay. They also break out the two S3 datacenter locations (EU and US), the two ends of a Mechanical Turk transaction (Requester and Worker), plus the EC2 API.
  • Salesforce: Only the core salesforce.com services across 12 individual systems (based on geographic location and purpose).
  • Zoho: All 23 Zoho services are covered, plus their mobile site and their single sign-on system.
What health information is provided?
  • Amazon Web Services: Current status, plus about 30 days of historical status. Status is determined to be one of "Service is operating normally", "Performance Issues", or "Service disruption". "Information messages" are occasionally provided.
  • Salesforce: Current status, plus exactly 30 days of historical status. Status is determined to be either "Instances available", "Performance Issues", "Service disruption", or "Status not available". "Informational messages" are also provided on occasion.
  • Zoho: Current status and the response time for the past hour, in addition to historical uptime for the past week. Also provided are two graphs representing uptime and response time for the past seven days. If that wasn't enough, current uptime and response from six geographical locations is also given.
Where does the uptime and performance data come from?
  • Amazon Web Services: No clue.
  • Salesforce: No clue.
  • Zoho: Their own "Site 24x7" monitoring service.
What is considered downtime and what is considered a performance issue?
  • Amazon Web Services: No clue.
  • Salesforce: No clue.
  • Zoho: No clue.
Are real time updates provided during downtime events? Is it easy to find?
  • Amazon Web Services: Yes, but unclear how consistently and how easy it is to find that information.
  • Salesforce: Yes, right underneath the current status.
  • Zoho: Does not appear so, but if the issue is big enough they may update customers through their blog.
Is information provided on past downtime events?
  • Amazon Web Services: Yes. Mousing over a past performance or downtime event brings up a chronological log of events that took place, from detection to resolution. In addition, major downtime events are explained.
  • Salesforce: Yes. Clicking on any past event brings up a window giving the time of the event, a detailed description of the problem, and a root cause analysis.
  • Zoho: No. Unless they are described in the blog.
Is there a way to easily report problems users are having?
  • Amazon Web Services: Yes, clicking the "Report an Issue" link.
  • Salesforce: No, other then using the standard support channels.
  • Zoho: No, other then using the standard support channels.
How can you get notified of problems (without watching this page 24/7)?
  • Amazon Web Services: Ability to subscribe to RSS feeds for change in status of each service.
  • Salesforce: No.
  • Zoho: No.
Conclusions: The best practices for online service health dashboards are still being formed, and it's clear that each service provider has approached the need for transparency differently. Amazon Web Services provides a simple and easy to understand overview of the health of each service, but provides little insight into who is impacted and what specific functionality is down. Salesforce provides clear insight into what customers may be affected by an event, but does little in offering insight into specific functionality that may be down or slow. Zoho provides the most data by far for each service they provide, but does not have a system in place to communicate details about specific downtime events beyond the company blog. Amazon and Salesfroce completely lack insight into how that they collect the health information, and all three give no information on what is meant by downtime or performance problems.

A closing questions for each provider:
  • Amazon Web Services: What does "EC2 API" actually mean? Which API is this referring to and why not cover the API's for the other services?
  • Salesforce: Does each server status cover every application level and API on that server? Can you offer more insight into specific services?
  • Zoho: Do you expect to add details about current and past downtime events to the health dashboard? What do you expect your customers to do when they see a red light? If you answer "Email Support", you don't get the power of this status page.
  • To all: How is the health actually monitored (especially for the GUI focused Salesforce and Zoho services? Working at a (the best) web monitoring company, I know how hard it is to monitor complex web applications.
Notable mentions: The following services also offer up health dashboard page, but to keep the comparison from getting overly complex I decided to leave them out. If anyone would like me to review these, or any other service that I missed, I'd be more then happy to. Just leave a note in the comments

22 comments:

  1. Thanks for a very detailed analysis. At Zoho, we are committed to providing as much information as we can, and your suggestions are very welcome!

    Sridhar

    ReplyDelete
  2. Glad to hear it. My current goal is to get a better idea of what your customers (plus those of AWS and Salesforce) think about these health dashboards. Specifically, what do they want to see, what's most important to them, and would they expect every SaaS provider to offer something like this in the future.

    ReplyDelete
  3. How is the health actually monitored (especially for the GUI focused Salesforce and Zoho services?

    For certain Zoho services such as Zoho Show & Wiki, we login to a test user account and check if everything goes through. This involves recording a sequence of urls/steps of the service and playing it back at regular intervals of time.

    For other services, we monitor the availability of sample public urls which in turn fetches content from the database/fileserver.These services do a login whenever we invoke their urls, which in turn ensures the login check is working. This method of monitoring also provides data from multiple monitoring locations. Zoho writer, sheet, etc. are monitored through this method.

    Arun
    Site24x7

    ReplyDelete
  4. Thanks for the insight Arun! The whole concept of monitoring complex web applications is topic in itself, and I don't think it's worth getting into right now, except to say that the only way to accurately monitor and track the performance of web applications is to use a real browser (e.g. IE7, Firefox) and run through the entire transaction on a regular interval. Otherwise you risk missing problems in the javascript (which I presume Zoho uses extensively), or missing broken dynamic links since you are hitting hard coded URL's. It's a tricky business. But in the end, your solution covers a vast majority of the potential downtime and performance issues, and so it's a great start.

    ReplyDelete
  5. One more great status page I didn't mention in the original post:

    http://status.opensrs.com/

    I love how they have a link to this page from every part of the site (right along the top right). Also impressive that you can subscribe to alerts when status changes, and I like how the archive links to an event log detailing each change in status.

    ReplyDelete
  6. Couple more status pages:

    http://status.mosso.com/
    http://heartbeat.skype.com/ (beating hearts are hilarious. Though the red gives the wrong impression)

    ReplyDelete
  7. http://www.mogulus.com/support/servicehealth

    ReplyDelete
  8. http://system.opendns.com/
    http://service.quickbase.com/updates.aspx

    ReplyDelete
  9. As basic as they get:
    http://github.wordpress.com/

    ReplyDelete
  10. http://status.netsuite.com/status.html

    ReplyDelete
  11. Lenny,
    Great post.

    It looked like BlueTie is using your company's tool (WebMetrics). If so, I'd like to comment that the report on their site is hard to read by me (and perhaps other partially color blind people). I see virtually no distinction between the yellow and green. Also, it would be nice to be able to 'cursor over' and get a textual indicator.

    Again, not sure if that is your report or theirs - just FYI.
    Jeff

    ReplyDelete
  12. Great point Jeff, I hadn't once thought about that problem of color blind people having trouble with status lights. Feels like the way http://status.opensrs.com/ handles this might just be ideal.

    Note that BlueTie does use the monitoring data from the company I work for (Webmetrics), but they built that dashboard completely on their own. I'll pass along that feedback to them though if I have a chance to speak with them.

    ReplyDelete
  13. Lenny,

    We've made a few enhancements to Zoho status page. These include:

    1) The ability to subscribe to RSS feeds to know the change in status of the services.
    2) Historical data for website availability.
    3) Option for users to 'report an issue' with any of the Zoho services.

    You might want to check this out.

    Arun
    Site24x7

    ReplyDelete
  14. Looking good Arun! The RSS will certainly be appreciated by your customers. Would also be really interesting to see how many issues are reported come from this page vs. your standard support page.

    One important thing I would still suggest you strongly consider is having a clear link to the status page from somewhere on your home page or wherever you think your customers first go when they have a problem. I'd be worried that many customers will have no idea this status page exists, defeating the purpose of even having one.

    ReplyDelete
  15. One of a kind thing about Amazon is their wonderfully created arrangements and coupons that empower customers to spare more. Amazon Coupons can be utilized to purchase items from the site at shockingly low costs. https://www.customercaretoll.com/listings/norton-customer-support-service-toll-free-phone-number

    ReplyDelete


  16. شركة نقل عفش
    اهم شركات مكافحة حشرات بالخبر كذلك معرض اهم شركة مكافحة حشرات بالدمام والخبر والجبيل والخبر والاحساء والقطيف كذلك شركة رش حشرات بالدمام ومكافحة الحشرات بالخبر
    شركة مكافحة حشرات بالدمام
    شركة تنظيف خزانات بجدة الجوهرة من افضل شركات تنظيف الخزانات بجدة حيث ان تنظيف خزانات بجدة يحتاج الى مهارة فى كيفية غسيل وتنظيف الخزانات الكبيرة والصغيرة بجدة على ايدى متخصصين فى تنظيف الخزانات بجدة
    شركة تنظيف خزانات بجدة
    شركة كشف تسربات المياه بالدمام
    شركة نقل عفش واثاث

    ReplyDelete

  17. شركة نقل عفش بالرياض وجدة والدمام والخبر والجبيل اولقطيف والاحساء والرياض وجدة ومكة المدينة المنورة والخرج والطائف وخميس مشيط وبجدة افضل شركة نقل عفش بجدة نعرضها مجموعة الفا لنقل العفش بمكة والخرج والقصيم والطائف وتبوك وخميس مشيط ونجران وجيزان وبريدة والمدينة المنورة وينبع افضل شركات نقل الاثاث بالجبيل والطائف وخميس مشيط وبريدة وعنيزو وابها ونجران المدينة وينبع تبوك والقصيم الخرج حفر الباطن والظهران
    شركة نقل عفش بجدة
    شركة نقل عفش بالمدينة المنورة
    شركة نقل اثاث بالرياض
    شركة نقل عفش بالدمام

    ReplyDelete