Monday, February 16, 2009

Media Temple goes down, provides a nice case study for downtime transparency

Earlier today we saw Media Temple experience intermittent downtime over the course of an hour. The first tweet showed up around 8am PST noting the downtime. At 9:06am Media Temple provided a short message confirming the outage:

At ~8:30AM Pacific Time we started experiencing networking issues at our El Segundo Data Center. We are working closely with them to determine the cause of these issues and will report any findings as they become available.

At this time we appear to be back fully. The tardiness of this update is a direct result of these networking issues.

So far, not too bad. Though note the broken rule in hosting your status page in the same location as your service. Lesson #1: Host your status page offsite. Let's keep moving with the timeline....

About the same time the blog post went up, a Twitter message by @mt_monitor pointed to the official status update. Great to see that they actually use Twitter to communicate with their users, and judging by the 360 followers, I think this was a smart way to spread the news. On the other hand, this was the only Twitter update from Media Temple throughout the entire incident, which is strange. And it looks like some users were still in the dark for a bit too long. I was also surprised that the @mediatemple feed made no mention of this. Maybe they have a good reason to keep these separate? Looking at the conversation on Twitter, feels like most people by default use the @mediatemple label. Lesson #2: Don't confuse your users by splitting your Twitter identity.

From this point till about 9:40am PST, users were stuck wondering what was going on:

A few select tweets show us what users were thinking. The conversation on Twitter goes on for about 30 pages, or over 450 tweets from users wondering what the heck was going on.

Finally at 9:40am, Media Temple released their findings:

Our engineers have spoken with the engineers at our El Segundo Data Center (EL-IDC3). Here are their findings:

ASN number 47868 was broadcasting invalid BGP data that caused our routers, and a lot of other routers on the internet, to reboot. This invalid BGP data exploited a software bug in our routers. We have applied filters to prevent us from receiving this invalid data.

At this time they are in contact with their vendors to see if there is a firmware update that will address this. You can expect to see network delays and small outages across the internet as other providers try to address this same issue.

Now that everything is back up and users are "happy", what else can we learn from this experience?

  1. Host your status page offsite. (covered above)
  2. Don't confuse your users by splitting your Twitter identity. (covered above)
  3. Some transparency is better then no transparency. The basic status message helped calm people down and reduce support calls.
  4. There was a huge opportunity here for Media Temple to use the tools of social media (e.g. Twitter/Blogging) as a two-way communication channel. Instead, Media Temple used both their blog and Twitter as a broadcast mechanism. I guarantee that if there were just a few more updates throughout the downtime period the tone of the conversation on Twitter would have been much more positive. Moreover, the trust in the service would have been damaged less severely if users were not in the dark for so long.
  5. A health status dashboard would have been very effective in providing information to the public beyond the basic "we are looking into it" status update. Without any work on the part of Media Temple during the event, its users would have been able to better understand the scope of the event, and know instantly whether or not it was still a problem. It would have been extremely powerful when combined with lesson 4, if a representative on Twitter simply pointed users complaining of the downtime to the status page.
  6. The power of Twitter as a mechanism for determining whether a service is down (or whether it is just you), and in spreading news across the world in a matter of minutes, again proves itself.


  1. Coach Outlet

    Christian Louboutin Shoes

    Valentino Shoes

    Michael Kors Outlet

    Coach Factory Outlet

    Coach Outlet Online

    Coach Purses

    Kate Spade Outlet

    Toms Shoes

    Hermes Belts

    Louis Vuitton

    Fendi Handbags

    Giuseppe Shoes

    Michael Kors Outlet

    Stephen Curry Shoes

    Salomon Shoes

    North Face Outlet

    Coach Outlet

    North Face Outlet

    Burberry Outlet

    North Face Outlet

    North Face Jackets

    Skechers Shoes

    Toms Outlet

    North Face Outlet

    Nike Air Max

    Nike Hoodies

    Marc Jacobs Handbags

    Marc Jacobs Outlet

    Jimmy Choo Shoes

    Jimmy Choos

    Burberry Belt

    Louis Vuitton Belt

    Salvatore Ferragamo

    Marc Jacobs Handbags

    Lululemon Outlet

    True Religion Outlet

    Tommy Hilfiger

    Michael Kors Outlet

    Coach Outlet

    Red Bottoms

    Kevin Durant Shoes

    New Balance Outlet

    Adidas Outlet

    Coach Outlet Online

    Stephen Curry Jersey

    Vans Outlet

    Ralph Lauren Outlet

    True Religion Outlet

    ED Hardy Outlet

    North Face Outlet

    UGG Outlet

    UGG Outlet

    North Face Outlet

    Ugg Boots Sale

    UGGS For Women

    Skechers Go Walk

    Adidas Yeezy Boost

    Adidas Yeezy

    Adidas NMD

    Coach Outlet


Note: Only a member of this blog may post a comment.