11 hours later, the #caseofthemondays was over and they were back online. Throughout the those 11 hours, users had one of the following experiences:
1. When visiting foursquare.com, they saw:
1. When visiting foursquare.com, they saw:
2. When using the iPhone/Android/Blackberry app, they saw an error telling them the service is down and to try again later.
3. When checking Twitter (the not default source of downtime information), they saw a lot of people complaining and the following tweets from the official @foursquare account (if they thought of checking the @foursquare account):
Those were the only options available to a user of Foursquare for those 11 hours. A important question we need to answer is whether anyone seriously cared. Are users of consumer services like Foursquare legitimately concerned with Foursquare's downtime? Are they going to leave for competing services or just quit the whole check-in game? I'd like to believe that 11 hours of downtime matters, but honestly it's too early to tell. This will be a great test of the stickiness and Whuffie that Foursquare has built up.
The way I see it is that this is one strike against Foursquare (which includes the continued instability they've seen since Monday). They probably won't see a significant impact to their user base. However, if this happens again, and again, and again, the story changes. And as I've argued, downtime is inevitable. Foursquare will certainly go down again. They key is not reducing downtime to zero, but how you handle that downtime to avoid giving your competition an opening and even more importantly using that downtime to build trust and loyalty with your users. How do you accomplish this? Transparency.
We've talked about the benefits of transparency, why transparency works, and how to implement it. We saw above how Foursquare handled the pre- and intra- downtime steps (not well), so let's take a look at how they did in the post-downtime phase by reviewing the public postmortem (both of them) they published. As always, let's run it through the gauntlet.
- Admit failure - Excellent. The entire first paragraph describes the downtime, and how painful it was to users.
- Sound like a human - Very much. This has never been a problem for Foursquare. The tone is very trustworthy.
- Have a communication channel - Prior to the event, all they had were their twitter accounts and their API developer forums. As a result of this incident, they have since launched http://status.foursquare.com/, and have promised to update @4sqsupport on a regular basis throughout the incident.
- Above all else, be authentic - This may be the biggest thing going for them.
- Start time and end time of the incident - Missing. All we know is that they were down for 11 hours. I don't see this as being critical in this case, but it would have been nice to have.
- Who/what was impacted - A bit vague, but the impression was that everyone was impacted.
- What went wrong - Extremely well done. I feel very informed, and can sympathize with the situation.
- Lessons learned - Again, extremely well done. I love the structure they used: What happened, What we’ll be doing differently – technically speaking, What we’re doing differently – in terms of process. Very effective.
- Details on the technologies involved - Yes!
- Answers to the Five Why's - No :(
- Human elements - heroic efforts, unfortunate coincidences, effective teamwork, etc - Yes!
- What others can learn from this experience - Yes!
- Foursquare launched a public heath status feed! Check it out at http://status.foursquare.com/.
- I really like the structure used in this postmortem. It has inspired me to want to create a basic template for postmortems. Stay tuned...
- Could this be Facebook's Friendster moment? I hope not. My personal project rely's completely on Foursquare.
- I've come to realize that for in most cases, downtime is less impactful to the long term success of a business than site performance. Downtime users understand and just try again later. Slowness eats away at you, you start to hate using the service and jump on an opportunity to use something more fun/fast/pleasant.
Going forward, the big question will be whether Foursquare maintains their new processes, keeps the status blog up to date, and can fix their scalability issues. I for one am rooting for them.