Wednesday, December 17, 2008
Tuesday, December 16, 2008
Rule #1: Must show the current status for each "service" you offer
- Considering this is meant to cover only the App Engine service, and not any other Google service, I would say they accomplished their goal. Every API they offer appears to be covered, in addition to the "Serving" metric which appears to test the overall service externally.
- I appreciate the alphabetic sorting of services, but I would suggest making the "Serving" status a bit more prominent as that would seem to be by far the most important metric.
- Conclusion: Met!
- Hard to say until an event occurs or we hear feedback about this from users.
- The announcement does claim the data is an "up-to-the-minute overview of our system status with real-time, unedited data." If this is true, this is excellent news.
- The fact that an "Investigating" status is an option tells me that the status may not always be real-time or unedited. Or I may just be a bit too paranoid :)
- In addition the fact that "No issues" and "Minor performance issues" are both considered healthy tells us that issues Google considers "minor" will be ignored or non-transparent. That's bad news. Though it does fit with their SLA questions that came up recently.
- Conclusion: Time will tell (but promising)
- If I were experiencing a problem with App Engine, I would first go to the homepage here. Unfortunately I don't see any link to the system status page. A user would either have to stumble upon the blog post announcing this page, or work through the forum...defeating the purpose of the system status page!
- The URL to the system status (http://code.google.com/status/appengine/) page is not easy to remember. Since Google doesn't seem to own appengine.com, this is may not be easy to fix, but that doesn't matter to a user that's in the middle of an emergency and needs to figure out what's going on. The good news is that at the time of this writing, a Google search for "google app engine status" has the status page as the third result, and I would think that it will raise to #1 very soon.
- Conclusion: Not met (but easy to fix by adding a link from the App Engine homepage).
- Again, hard to say until we see an issue occur.
- What I'm most interested in is how much detail they provide when an event does occur, and whether they send users over to the forums or to the blog, or simply provide the information on the status page.
- Conclusion: Time will tell.
- Great job with this. I dare say they've jumped head of every other cloud service in the amount and detail on performance data they provide.
- Still unclear how much historical data will be maintained, but even 7 days is enough to satisfy me.
- Conclusion: Met!
- Nada here, beyond pointing people to the Downtime Notify Google Group.
- Conclusion: Not met.
- Beyond the mention that they are "using some of the same raw monitoring data that our engineering team uses internally", no real information on how this data is collected, how often it is updated, or where the monitoring happens from.
- Conclusion: Not met.
From the announcement:
"The new System Status Site provides a detailed view into the performance of various App Engine components using some of the same raw monitoring data that our engineering team uses internally. This includes:My 10 second first impression is that overall they did a great job, especially the details you can get when drilling down on a specific service and day (clicking on a checkmark). Time will tell how many of the rules of successful dashboard's they meet. I plan to dive a little deeper in the next day or two, but for now...kudos to Google for making this a reality!
- up-to-the-minute overview of our system status with real-time, unedited data
- daily overall serving status for each of our APIs, including any outages or downtime
- detailed historical latency and error-rate graphs for the App Engine Datastore, Images, Mail, Memcache, Serving, URL Fetch, and Users components
In addition to the Downtime Notify Google Group, we'll use this dashboard to announce scheduled downtime and explain any issues that affect App Engine applications. You'll be able to see real data behind any issues that we experience along with explanations from our team.
We'll continue to tune this dashboard to make sure we're providing useful and accurate information about App Engine's uptime."