Wednesday, December 17, 2008

Steve Souders - The State of Performance 2008 (and a look ahead to 2009)

Mr. Web Site Performance, Steve Souders, put together a really nice "review of what happened in 2008 with regard to web performance, and [his] predictions and hopes for what we’ll see in 2009." Check it out at Steve Souders' blog. He references a lot of tools and services you may have missed over the course of the year, and gives some good advice for web developers (and online businesses).

Tuesday, December 16, 2008

Google App Engine System Status - A Review

Building off of the rules for a successful public health dashboard, let's see what Google did well, what they can improve, and what questions remain:

Rule #1: Must show the current status for each "service" you offer
  • Considering this is meant to cover only the App Engine service, and not any other Google service, I would say they accomplished their goal. Every API they offer appears to be covered, in addition to the "Serving" metric which appears to test the overall service externally.
  • I appreciate the alphabetic sorting of services, but I would suggest making the "Serving" status a bit more prominent as that would seem to be by far the most important metric.
  • Conclusion: Met!
Rule #2: Data must be accurate and timely
  • Hard to say until an event occurs or we hear feedback about this from users.
  • The announcement does claim the data is an "up-to-the-minute overview of our system status with real-time, unedited data." If this is true, this is excellent news.
  • The fact that an "Investigating" status is an option tells me that the status may not always be real-time or unedited. Or I may just be a bit too paranoid :)
  • In addition the fact that "No issues" and "Minor performance issues" are both considered healthy tells us that issues Google considers "minor" will be ignored or non-transparent. That's bad news. Though it does fit with their SLA questions that came up recently.
  • Conclusion: Time will tell (but promising)
Rule #3: Must be easy to find
  • If I were experiencing a problem with App Engine, I would first go to the homepage here. Unfortunately I don't see any link to the system status page. A user would either have to stumble upon the blog post announcing this page, or work through the forum...defeating the purpose of the system status page!
  • The URL to the system status ( page is not easy to remember. Since Google doesn't seem to own, this is may not be easy to fix, but that doesn't matter to a user that's in the middle of an emergency and needs to figure out what's going on. The good news is that at the time of this writing, a Google search for "google app engine status" has the status page as the third result, and I would think that it will raise to #1 very soon.
  • Conclusion: Not met (but easy to fix by adding a link from the App Engine homepage).
Rule #4: Must provide details for events in real time
  • Again, hard to say until we see an issue occur.
  • What I'm most interested in is how much detail they provide when an event does occur, and whether they send users over to the forums or to the blog, or simply provide the information on the status page.
  • Conclusion: Time will tell.
Rule #5: Provide historical uptime and performance data
  • Great job with this. I dare say they've jumped head of every other cloud service in the amount and detail on performance data they provide.
  • Still unclear how much historical data will be maintained, but even 7 days is enough to satisfy me.
  • Conclusion: Met!
Rule #6: Provide a way to be notified of status changes
Rule #7: Provide details on how the data is gathered
  • Beyond the mention that they are "using some of the same raw monitoring data that our engineering team uses internally", no real information on how this data is collected, how often it is updated, or where the monitoring happens from.
  • Conclusion: Not met.
Overall, in spite of more rules being missed than met, the more difficult requirements are looking great, and the pieces are in place to create a very complete and extremely useful central place for their customers to come in time of need. I'm excited to see where Google takes this dashboard from here, and how other cloud services respond to this ever growing need.

Google launches System Status Dashboard for AppEngine

Google has finally launched a health dashboard for their AppEngine service!

From the announcement:
"The new System Status Site provides a detailed view into the performance of various App Engine components using some of the same raw monitoring data that our engineering team uses internally. This includes:
  • up-to-the-minute overview of our system status with real-time, unedited data
  • daily overall serving status for each of our APIs, including any outages or downtime
  • detailed historical latency and error-rate graphs for the App Engine Datastore, Images, Mail, Memcache, Serving, URL Fetch, and Users components

In addition to the Downtime Notify Google Group, we'll use this dashboard to announce scheduled downtime and explain any issues that affect App Engine applications. You'll be able to see real data behind any issues that we experience along with explanations from our team.

We'll continue to tune this dashboard to make sure we're providing useful and accurate information about App Engine's uptime."

My 10 second first impression is that overall they did a great job, especially the details you can get when drilling down on a specific service and day (clicking on a checkmark). Time will tell how many of the rules of successful dashboard's they meet. I plan to dive a little deeper in the next day or two, but for now...kudos to Google for making this a reality!

Sunday, December 14, 2008


The 100 oldest registered .com domains: