Resources
Service management for IT Ops, development and business teams

Deliver high velocity service management at scale.

Get it free

Learn more

How to manage the end-to-end delivery of IT services

Check out tips to improve your service management practices.

Learn more

Everything you need to know to get setup on JSM

These guides cover everything from the basics to in-depth best practices.

View guide

Jira Service Management resource library

Browse through our whitepapers, case studies, reports, and more to get all the information you need.

View library

Resources
Service management for IT Ops, development and business teams

Deliver high velocity service management at scale.

Get it free

Learn more

How to manage the end-to-end delivery of IT services

Check out tips to improve your service management practices.

Learn more

Everything you need to know to get setup on JSM

These guides cover everything from the basics to in-depth best practices.

View guide

Jira Service Management resource library

Browse through our whitepapers, case studies, reports, and more to get all the information you need.

View library

Incident management for high-velocity teams

The importance of an incident postmortem process

Incidents happen.

They just do. As our systems grow in scale and complexity, failures are inevitable.

Incidents are also a learning opportunity.

Use free ITSM post incident review template

A chance to uncover vulnerabilities in your system. An opportunity to mitigate repeat incidents and decrease time to resolution. A time to bring your teams together and plan for how they can be even better next time.

The best way to work through what happened during an incident and capture any lessons learned is by conducting an incident postmortem, also known as a post-incident review.

An incident postmortem brings people together to discuss the details of an incident: why it happened, its impact, what actions were taken to mitigate it and resolve it, and what should be done to prevent it from happening again.

Thanks to tools like version control, feature flags, and continuous delivery, a lot of incidents can be quickly “undone.” Many incidents are caused by some bug in a change pushed to production, and rolling back that change can get the app up and running again. This is really beneficial for everyone, it gets the service quickly working again. But it often doesn’t help you understand what failed and why. This is where postmortems come in.

An incident postmortem is a framework for learning from incidents and turning problems into progress. It also builds trust with customers, colleagues, and end users (basically the folks affected by the incident) and lets them know your team is working to minimize future incidents and impact.

A postmortem is an important step in the lifecycle of an always-on service. The findings from your postmortem should feed right back into your planning process. This ensures that the critical remediation work identified in the postmortem finds a place in upcoming work and is balanced against other upcoming work and priorities.

The benefits of an incident postmortem

You may be tempted to skip a formal incident postmortem meeting and write-up, especially if you are certain of what caused the incident, and you’re pretty sure you’ve fixed the issue.

That may be true—for you. But there may be people on your team who haven’t internalized what happened to cause the incident and could benefit from your clear understanding and improve their service to the team—and your customers.

Bringing people together to engage in a structured, collaborative process allows everyone to contribute what they learned and can build trust and resiliency within your team. And documenting the incident and how the team remedied it can inform how future incidents are handled.

You may also decide to publish takeaways from your incident postmortem with customers or the rest of your organization. This can go a long way in rebuilding confidence in people who may not have been closely involved as the incident was happening. Other teams in your organization, especially leadership, may need to see the details of the problem and what steps were taken to resolve it to head off any second-guessing of your team in the future.

Partners, customers, and end-users may also want to know what happened and what steps you have taken to improve their experience. Making your incident postmortem available on your public-facing website may not be appropriate in all cases, but your marketing or public relations team can help you craft the language so people get the information in a way that is informative and builds trust in your services.

Best practices for an incident postmortem

How you approach your incident postmortem is just as important as the checklist of steps you take. Tensions can run high in the wake of an incident. The key to getting people to come to the process engaged and ready to tackle a difficult problem is to give them a sense of psychological safety.

Establish a blameless culture

Former Etsy CTO John Allspaw wrote a seminal piece on “blameless postmortems.” This approach to the investigation of an incident allows the people involved in an incident to account for all their actions, their impact, and what they knew and when, without fear of punishment or retribution.

This approach is key to making sure your teams openly share information and get to the root cause of an incident. If anyone fears rebuke they may hold back information or try to redirect blame. When this happens, people lose trust in each other. And the organization loses the opportunity to build resiliency in its teams and systems. Many teams, including here at Atlassian and at Google, have adopted the tenants of the blameless postmortem in order to avoid those pitfalls.

Avoid pointing fingers, keep critiques constructive

In your postmortem meeting—and in the subsequent write-up of the findings—avoid language that singles out individuals as personally responsible for the incident. Instead, focus on actions, results, and impact.

While it’s important to keep the conversation safe and objective, getting to the root cause of the incident is critical to resolving it. You can use a technique in your meeting called “The 5 Whys.” Start by making sure everyone agrees on what the problem is. Then, ask why this happened, and then ask “why” to the answer to that question. Repeat this at least five times to make sure you uncover all the deep factors contributing to the problem. Make sure the room doesn't try to steer away from an uncomfortable truth or try to reach an easy consensus. You can learn more about “The 5 Whys” approach with our Playbook Play here.

Review every single postmortem, and ingrain this into your process

An unreviewed incident postmortem report might as well never have been written. Once an incident postmortem report is drafted, it’s important to review it to close out any unresolved issues, capture ideas to consider in the future, and finalize the report. You may even say that the incident isn’t truly resolved until this review has taken place.

How do you make this happen? Schedule a recurring meeting with engineering (and anyone else who may have an interest, like customer support or account managers), at least monthly, to review incident postmortem reports. You can choose to review recent reports or perhaps review older reports and share lessons that are still relevant today.

An effective incident postmortem plan

In order for postmortems to be effective — and allow you to build a culture of continuous improvement — you want to implement a simple, repeatable process that everyone can participate in. How you do this will depend on your culture and your team. At Atlassian, we’ve developed a method that works for us and you can read more about it in depth our incident handbook.

Here are some tips to get started:

Tip 1: Set a threshold

Incidents in your organization should have clear and measurable severity levels. These severity levels can be used to trigger the postmortem process. For example, any incident Sev-1 or higher triggers the postmortem process, while the postmortem can be optional for less severe incidents. Consider allowing team leads or management the opportunity to request a postmortem for any incident that doesn’t meet the threshold.

Tip 2: Don’t procrastinate

It’s important to take a break and get some rest after an incident. But don’t delay writing the incident postmortem. Wait too long and important details might be lost or forgotten. Ideally, it’s drafted immediately after a post-incident review meeting to be held within 24-48 hours of the incident resolving, and not more than five business days.

Tip 3: Assign roles and owners

A post-incident review meeting is where you’ll hash out the details that will be recorded into the incident postmortem. It’s good to delegate the postmortem draft to a specific person, ideally someone familiar with the incident, and who has the required level of technical and organizational knowledge to understand the causes and mitigations.

Tip 4: Work from a template

A template can keep you from leaving out key details. And it’s a great way to build consistency throughout your postmortems.

Tip 5: Include a timeline

A timeline is a very helpful aid in incident documentation. Often it’s the first place your readers’ eyes jump to when trying to quickly size up what happened. Try to be as clear and specific as possible. For example, “11:14 am Pacific Standard Time,” not “around 11.” Being specific with timestamps allows you to map out a high-fidelity chain of events, which is useful to identify areas of improvement. For example, you might identify that the interval between when impact started and when customers were notified was too long.

Important times to include.

First alert or ticket
First comms announcement (internal and/or external)
Times of status page updates
Time of any remediation attempts (code rollbacks, etc.)
Time of resolution

Tip 6: Details, details, details

Skimping on details is a quick path to writing postmortems that are unhelpful and unclear. Add as many details as possible about what happened and what was done during the incident. Instead of “then public comms went out,” say “We sent the initial public comms announcing the incident on our public status page and Twitter account.”

Wherever possible include links and names, links to tickets and status updates, links to incident state documents and monitoring charts. Don’t be afraid to add screenshots of relevant graphics or dashboards, too. A graph from your monitoring system that clearly shows the incident's start and end times (for example, a drop in request rate followed by a return to normal) is very valuable because it's unambiguous. It becomes even more powerful when combined with graphs that show what was happening behind the scenes during that time, for example database connections, network link state or CPU / memory / io / bandwidth consumption over the same timeframe.

Tip 7: Capture incident metrics

When you capture metrics in your incident postmortem you apply hard data to the issues and their impact. Having these data points helps you determine if your team is headed in the right direction and reducing the number of incidents, their severity, and downtime. With consistent metrics being measured, you can take a step back and look at incident trends over time.

Some metrics to consider in your incident postmortem tracking:

The number of minutes of downtime, so you can track if this number is doing up or down
The severity of the incident, so you can determine the relative reliability of your systems.
Mean Time to Resolution (MTTR), which measures the average time it takes to resolve an incident, from when it was initially reported.

The most important tip? Don’t skip any steps. The key to conducting incident postmortems that help you improve your team and systems is to have a process and stick to it.

Use an incident postmortem template to streamline the process

In order to ensure that your team develops a culture around incident postmortem reviews, make it easy to capture information, schedule meetings, and publish the final report with reusable checklists and templates. A repeatable process provides consistency and helps people know what to expect, and then come to the process with a productive mindset.

Typical checklist items for an incident postmortem process:

Meetings that need to be held:

Information gathering meeting
Review of report
Presentation of report

Information that needs to be gathered ahead of time:

Standard agendas for each meeting
Participants, stakeholders, reviewers
Standardize incident postmortem report writing with a template

Tutorial

Setting up an on-call schedule with Opsgenie

In this tutorial, you’ll learn how to set up an on-call schedule, apply override rules, configure on-call notifications, and more, all within Opsgenie.

Read this tutorial

Template

Incident Postmortem Templates: Improve Response Process

Access customizable incident postmortem templates to streamline your analysis and enhance future incident responses.

Read this article

Featured

Jira

Confluence

Jira Service Management

Developers

Jira

Compass

Pipelines

Bitbucket

DX

Rovo Dev

Product Managers

Jira

Confluence

Jira Product Discovery

IT professionals

Jira Service Management

Guard

Business Teams

Jira

Confluence

Trello

Loom

Jira Service Management

Customer Service Management

Leadership Teams

Focus

Talent

Jira Align

Solutions

Why Atlassian

System of Work New

Integrations

Customers

FedRAMP

Resilience

Platform

Trust center

Resources

Customer Support

Find Partners

Atlassian Ascend

Community

support

Resources

Jira

Jira Service Management

Confluence

Jira Service Management

ITSM

Product guide

Resource library

Service management for IT Ops, development and business teams

How to manage the end-to-end delivery of IT services

Everything you need to know to get setup on JSM

Jira Service Management resource library

Jira Service Management

ITSM

Product guide

Resource library

Service management for IT Ops, development and business teams

How to manage the end-to-end delivery of IT services

Everything you need to know to get setup on JSM

Jira Service Management resource library

Incident management for high-velocity teams

The importance of an incident postmortem process

The benefits of an incident postmortem

Best practices for an incident postmortem

Establish a blameless culture

Avoid pointing fingers, keep critiques constructive

Review every single postmortem, and ingrain this into your process

An effective incident postmortem plan

Tip 1: Set a threshold

Tip 2: Don’t procrastinate

Tip 3: Assign roles and owners

Tip 4: Work from a template

Tip 5: Include a timeline

Tip 6: Details, details, details

Tip 7: Capture incident metrics

Use an incident postmortem template to streamline the process