Transparent Uptime: Case Study: Facebook outage

Wednesday, September 29, 2010

Case Study: Facebook outage

I'm a bit late to the story (something called a day job getting in the way!) but I can't pass up an opportunity to discuss how Facebook handled the "worst outage [they've] had in over four years". I blogged about the intra-incident communication the day they had the outage, so let's review the postmortem that came out after they had recovered, and how they handled the downtime as a whole.

Using the "Upside of Downtime" framework (above) as a guide:

Prepare: Much room for improvement. The health status feed is hard to find for the average user/developer, and the information was limited. On the plus side, it exists. Twitter was also used to communicate updates, but again the information was limited.
Communicate: Without a strong foundation create by the Prepare step, you don't have much opportunity to excel at the Communicate step. There was an opportunity to use the basic communication channels they had in place (status feed, twitter) more effectively by communicating throughout the incident, with more actionable information, but alas this was not the case. Instead, there was mass speculation about the root cause and the severity. That is exactly what you want to strive to avoid.
Explain: Let's find out by running the postmortem through our guideline for postmortem communication...

Prerequisites:

Admit failure - Excellent, almost a textbook admittance without hedging or blaming.

Sound like a human - Well done. Posted from Director of Engineering at Facebook Robert Johnson's personal account, the tone and style was personal and effective.

Have a communication channel - Can be improved greatly. Making the existing health status page easier to find, more public, and more useful would help in all future incidents. I've covered how Facebook can improve this page in a previous post.

Above all else, be authentic - No issues here.

Requirements:

Start time and end time of the incident - Missing.

Who/what was impacted - Partial. I can understand this being difficult in the case of Facebook, but I would have liked to see more specifics around how many many users were affected. On one hand this is a global consumer service that may not be critical to people's lives. On the other hand though, if you treat your users with respect, they'll reward you for it.

What went wrong - Well done, maybe the best part of the postmortem.

Lessons learned - Partial. It sounds like many lessons were certainly learned, but they weren't directly shared. I'd love to know what the "design patterns of other systems at Facebook that deal more gracefully with feedback loops and transient spikes" look like.

Bonus:

Details on the technologies involved - No

Answers to the Five Why's - No

Human elements - heroic efforts, unfortunate coincidences, effective teamwork, etc - No

What others can learn from this experience - Marginal

Biggest lesson for us to take away: Preparation is key to successfully managing outages, and using them to build trust with your users.

13 comments:

UnknownApril 25, 2017 at 10:01 PM
This is so interesting, I have never seen this before , Nice Post. Bandar Domino
a
ReplyDelete
Replies
UnknownJune 13, 2017 at 7:35 PM
mlb jerseys
louboutin pas cher
michael kors outlet canada
coach outlet online
polo ralph lauren outlet online
michael kors bags
michael kors outlet clearance
michael kors outlet online
adidas nmd r1
oakley sunglasses outlet
170614yueqin
ReplyDelete
Replies
Deni TegarSeptember 4, 2017 at 8:38 PM
I suppose you will keep the quality work going on. It's for some other informative blog. Where else could I get that type of information obat kelenjar getah bening obat benjolan di tangan obat infeksi saluran kencing
ReplyDelete
Replies
UnknownOctober 19, 2017 at 3:31 PM
good articles and solid content is very interesting
http://www.sinidomino.com/
ReplyDelete
Replies
AnonymousOctober 30, 2017 at 8:28 AM
Jual Obat Aborsi, Obat Penggugur Kandungan Ampuh

Obat Aborsi, Obat Penggugur kandungan
ReplyDelete
Replies
UnknownDecember 2, 2017 at 12:27 AM
REALLY GOOD! i like it so much<3 Thanks for the Good Artickle Sir.
agen judi poker online terpercaya di indonesia

ReplyDelete
Replies
AnonymousDecember 11, 2017 at 3:00 AM
This article is very interesting to read and very easy to understand I am very interested in your article thanks.
raja poker

ReplyDelete
Replies
Black HawkDecember 16, 2017 at 1:04 AM
I have exactly what info I want. Check, please. Wait, it's free? Awesome! dewapoker
ReplyDelete
Replies
UnknownJanuary 4, 2018 at 2:46 AM
I was very impressed by this post, this site has always been pleasant news. Thank you very much for such an interesting post. Keep working, great job! In my free time, I like play game
bandar togel
togel singapura

ReplyDelete
Replies
UnknownJanuary 4, 2018 at 10:36 AM
It’s really a great and useful piece of information. I’m glad that you just shared this helpful information with us.
Please stay us informed like this. Thanks for sharing. ??

Agen Bola
Agen Bola Terpercaya
Poker Online Indonesia Terpercaya
Poker Online Indonesiaa
ReplyDelete
Replies
UnknownJanuary 5, 2018 at 4:38 AM
articles that are very good and for the future I hope your article more useful thanks again.
dewa poker

ReplyDelete
Replies
Situs Poker OnlineFebruary 4, 2018 at 8:34 AM
It’s really a great and useful piece of information. I’m glad that you just shared this helpful

information with us.

Situs Poker Online
domino qiu qiu
Poker Online Indonesia
Poker Online Indonesia

Terpercaya
Poker Online Indonesia
Agen Poker Online
Poker Online
DellPoker
DellPoker
PokerGalaxy
IngatPoker
Axioobet
ReplyDelete
Replies
texus technologyFebruary 6, 2018 at 11:21 PM
Packers And Movers Indore To Gurgaon
Packers And Movers Indore To Noida
Packers And Movers Lucknow To Chennai
Packers And Movers Lucknow To Delhi
Packers And Movers Lucknow To Mumbai
Packers And Movers Lucknow To Kolkata
Packers And Movers Lucknow To Pune
Packers And Movers Thane To Ahmedabad
Packers And Movers Thane To Chennai
Packers And Movers Thane To Ghaziabad
ReplyDelete
Replies

Note: Only a member of this blog may post a comment.

Subscribe to: Post Comments (Atom)