A good report should be based on a clear and consistent framework. Effective teams set up every postmortem on a template, where participants answer a set of questions or prompts.
This ensures key details aren’t forgotten. It also builds consistency across incidents, and helps the team identify patterns, trends, and opportunities for improvement. The framework can be iterated and improved on over time, but any changes should be intentional.
Rich details and data
Postmortem fields aren’t places to skimp on details and gloss over events. This is where you want to get very granular and specific. Don’t say you saw a traffic spike, say precisely by how much and what metric changed. Don’t say the team was confused, pull in an exact quote from the chat history where someone expressed confusion.
Inclusive, blameless language
Like many teams, we practice blameless postmortems here at Atlassian. It’s important to keep finger-pointing out of the meeting and the analysis of the incident. But be sure the same care is taken with the words written on the report. Avoid language that dishes out blame or singles people out.
Important questions to ask during a postmortem report
Screenshots Attach relevant screenshots, especially ones the response team took during the outage. What did you see change in the product? What product behavior didn’t happen as expected?
Tickets Link to any relevant tickets related to the incident.
Customer feedback Did any customer feedback come in about the incident? These could be reported in places like a help desk, over email, on social media. Don’t worry about including all of it.
Charts and grafs What data visualizations help show the impact of the incident?
Data Any other key data points about the incident or its impact?
Chat exchanges If the team uses a chat tool like Slack during the response effort, consider including any key messages or exchanges from the chat history.
Timelines A clear timeline of the incident is an excellent aid for incident analysis. What were the key events and their timestamps during the incident.
Internal vs. external postmortem reports
While it’s less common, some organizations choose to publish a public version of a post-mortem after an incident. This is especially common for large scale consumer services who have outages that affect a lot of users. They might be publishing the full postmortem report, or (more likely) they’re publishing a trimmed-down version of the internal report. It’s likely necessary to clean up some sensitive or private information.
Learn incident communication with Statuspage
In this tutorial, we’ll show you how to use incident templates to communicate effectively during outages. Adaptable to many types of service interruption.