Resources
Service management for IT Ops, development and business teams

Deliver high velocity service management at scale.

Get it free

Learn more

How to manage the end-to-end delivery of IT services

Check out tips to improve your service management practices.

Learn more

Everything you need to know to get setup on JSM

These guides cover everything from the basics to in-depth best practices.

View guide

Jira Service Management resource library

Browse through our whitepapers, case studies, reports, and more to get all the information you need.

View library

Resources
Service management for IT Ops, development and business teams

Deliver high velocity service management at scale.

Get it free

Learn more

How to manage the end-to-end delivery of IT services

Check out tips to improve your service management practices.

Learn more

Everything you need to know to get setup on JSM

These guides cover everything from the basics to in-depth best practices.

View guide

Jira Service Management resource library

Browse through our whitepapers, case studies, reports, and more to get all the information you need.

View library

Incident management for high-velocity teams

Get to know the incident response lifecycle

Hang around security and incident management pros long enough, and you’ll notice a pattern. The smartest people in these industries think in cycles, not straight lines.

Why is that? What does that even mean? That means every incident and outage isn’t an isolated event with a beginning and end point (though it may seem like that). Incidents are a learning opportunity.

Just because a service is “operational” again, doesn’t mean your team’s work is over. Post-incident activities should have you putting plans on future roadmaps, changing the way you prepare for future incidents, and discovering new things to build which will prevent more incidents in the future. It’s a never-ending cycle of improvement, and there are a few different ways to think about the various stages, depending on what school of thought you subscribe to.

Atlassian’s incident response lifecycle

Atlassian's incident response lyfecycle chart

1. Detect the incident

Our incident detection typically starts with monitoring and alerting tools. Though sometimes we first learn about an incident from customers or team members.

Since incident alerts can originate from different sources, having a solution that integrates a variety of alerting and reporting tools can be the difference between a disjointed, cumbersome response and a cohesive, collaborative one. A solution like Jira Service Management allows teams to customize and filter alerts across all monitoring, logging, and CI/CD tools to ensure teams swarm incidents quickly while avoiding alert fatigue.

2. Set up team communication channels

An important first step is to set up the incident team's communication channels. The goal at this point is to focus team communications in well-known places, such as a dedicated Slack channel and video conference bridge.

Within Jira Service Management, coordinating incident responses can be a smooth process. Not only are teams enabled to communicate in ways that work best for them — like Slack and video conferencing — but communicating with customers becomes easier with automation and customization, too. We'll cover external communication in Step 4.

3. Assess the impact and apply a severity level

Now it’s time to assess the impact of the incident so the team can decide who else to contact and what to communicate with customers and stakeholders. Assigning a severity level not only identifies the impact of the incident, but it also lays the groundwork for resolution plans and external communications. In Jira Service Management, escalating an incident and assigning severity triggers automated actions as well as notifications to responders to stay on top of resolution progress.

4. Communicate with customers

We aim to communicate to stakeholders internally and externally as soon as possible. Communicating quickly and accurately helps build trust with customers and the rest of the organization. Like we mentioned before, the ability to customize the way you communicate empowers your team to work they way they want, facilitating a faster resolution. The ability to customize communication also empowers your team to take control of what message they want to project and when. Moreover, save your team time in the midst of an incident with automated replies from within a ticket, sent directly to the customer.

5. Escalate to the right responders

Initial responders often need to bring other teams into the incident by paging them using an alerting features in Jira Service Management. Bring responders directly to the incident ticket by grouping related tickets and tagging relevant responders directly on the ticket. This way, notifications are coordinated and everyone has the full context.

6. Delegate incident response roles

As additional team members join the response, the incident manager delegates a role to them. This is where it's helpful to have s proper incident response playbook — developed beforehand — that outlines clear roles and responsibilities. Individuals on the incident response team are familiar with each role and know what they’re responsible for during an incident.

7. Resolve the incident

An incident is resolved when the current or imminent business impact has ended. At that point, the emergency response process ends and the team transitions onto any cleanup tasks and the postmortem.

Ideally, your incident management solution is keeping a robust incident timeline — which is the case when using Jira Service Management. Responders can access crucial incident data afterwards and develop a report that helps teams avoid similar incidents in the future and find the root cause. Postmortems can also act as a resource, on the off chance something similar should happen again.

The NIST incident response lifecycle

Another industry standard incident response lifecycle comes from The National Institute of Standards and Technology, or NIST. NIST is a government agency which sets standards and practices around topics like incident response and cybersecurity.

NIST stands for National Institute of Standards and Technology. They’re a U.S. government agency proudly proclaiming themselves as “one of the nation’s oldest physical science laboratories”. They work in all-things-technology, including cybersecurity, where they’ve become one of the two industry standard go-tos for incident response with their incident response steps.

Like Atlassian, NIST believes that not every incident can be prevented. So it’s best to be prepared:

“Preventive activities based on the results of risk assessments can lower the number of incidents, but not all incidents can be prevented. An incident response capability is therefore necessary for rapidly detecting incidents, minimizing loss and destruction, mitigating the weaknesses that were exploited, and restoring IT services.” — NIST

The NIST incident response lifecycle breaks incident response down into four main phases: Preparation; Detection and Analysis; Containment, Eradication, and Recovery; and Post-Event Activity.

Phase 1: Preparation

The Preparation phase covers the work an organization does to get ready for incident response, including establishing the right tools and resources and training the team. This phase includes work done to prevent incidents from happening.

Phase 2: Detection and Analysis

Accurately detecting and assessing incidents is often the most difficult part of incident response for many organizations, according to NIST.

Phase 3: Containment, Eradication, and Recovery

This phase focuses on keeping the incident impact as small as possible and mitigating service disruptions.

Phase 4: Post-Event Activity

Learning and improving after an incident is one of the most important parts of incident response and the most often ignored. In this phase the incident and incident response efforts are analyzed. The goals here are to limit the chances of the incident happening again and to identify ways of improving future incident response activity.

Incident response for modern DevOps teams

Over the past decade, the DevOps movement has helped teams reshape how they build, deploy, and operate software. Along with that are innovations on how these teams respond to incidents.

The DevOps approach to managing incidents isn’t dramatically different from the traditional steps to effective incident management. DevOps incident management includes an explicit emphasis on involving developer teams from the beginning--including on call--and assigning work based on expertise, not job titles.

Incident response and continuous improvement

We started the article by talking about cycles vs. straight lines. You’ll notice something all these incident management approaches have in common: they are not linear. Each of them include the same basic component parts: ways of defining, detecting and identifying incidents; ways of quickly responding and taking action to mitigate incidents; and ways of analyzing incidents to improve future detection and response. There is no point in analyzing an incident that already happened just for the sake of that incident. You can’t go back in time and change what happened. You’re learning from the incident to improve the future detection and response. Constant, continuous learning and improvement is how teams close that cycle.

There are many moving parts to the (nonlinear) incident response process. Keeping track of each step with integrated collaboration and communication tools is easy with an incident management solution like Jira Service Management. Centralize alerts and unify teams with flexibility to respond to and resolve incidents quickly.

Try Jira Service Management free

Tutorial

Setting up an on-call schedule with Opsgenie

In this tutorial, you’ll learn how to set up an on-call schedule, apply override rules, configure on-call notifications, and more, all within Opsgenie.

Read this tutorial

Up next

Pros and cons of different approaches to on-call management

On call teams are rapidly evolving. Explore the pros and cons of different approaches to on call management.

Read this article

Featured

Jira Software

Confluence

Jira Service Management

Trello

Marketplace

WHAT’S NEW

Compass

Jira Product Discovery

You might find helpful

Featured

Work Management

IT Service Management

Agile & DevOps

BY TEAM SIZE

BY TEAM FUNCTION

BY INDUSTRY

What's new

Atlassian Together

You might find helpful

Learn

Support

Connect

What's new

Atlassian Presents: Unleash

Jira Service Management

ITSM

Product guide

Resource library

Service management for IT Ops, development and business teams

How to manage the end-to-end delivery of IT services

Everything you need to know to get setup on JSM

Jira Service Management resource library

Jira Service Management

ITSM

Product guide

Resource library

Service management for IT Ops, development and business teams

How to manage the end-to-end delivery of IT services

Everything you need to know to get setup on JSM

Jira Service Management resource library

Incident management for high-velocity teams

Get to know the incident response lifecycle

What is an incident response lifecycle?

Atlassian’s incident response lifecycle

1. Detect the incident

2. Set up team communication channels

3. Assess the impact and apply a severity level

4. Communicate with customers

5. Escalate to the right responders

6. Delegate incident response roles

7. Resolve the incident

The NIST incident response lifecycle

Phase 1: Preparation

Phase 2: Detection and Analysis

Phase 3: Containment, Eradication, and Recovery

Phase 4: Post-Event Activity

Incident response for modern DevOps teams

Incident response and continuous improvement

Setting up an on-call schedule with Opsgenie

Pros and cons of different approaches to on-call management