Prerequisites:
- Admit failure - Hiding downtime is no longer an option
- Sound like a human - Do not use a standard template, do not apologize for "inconveniencing" us
- Have a communication channel - Set up a process to handle incidents prior the event (e.g. public health dashboard, status blog, twitter account, etc.)
- Above all else, be authentic - You must be believed to be heard
Requirements:
- Start time and end time of the incident
- Who/what was impacted - Should I be worried about this incident?
- What went wrong - What broke and how you fixed it (with insight into the root cause analysis process)
- Lessons learned - What's being done to improve the situation for the future, in technology, process, and communication
Bonus:
- Details on the technologies involved
- Answers to the Five Why's
- Human elements - heroic efforts, unfortunate coincidences, effective teamwork, etc
- What others can learn from this experience

5 comments: