Close

How to choose incident management tools

Categories, key features, and what to look for

There is no single, one-size-fits-all tool for incident management.

The best-performing incident teams use a collection of the right tools, practices, and people.

Some tools are specific to incident management, others are more general purpose tools your team also uses for other tasks. And some tools might be a totally bespoke experience built upon layers of integrations and customization.

No matter the use case, good incident management tools have a few things in common. The best incident management tools are open, reliable, and adaptable.

Open: In a high-pressure environment like an incident, it’s key that the right people have access to the right tools and information immediately. This not only goes for incident responders, but for company stakeholders who need visibility into response efforts.

Reliable: There are few things worse during incident response than also having your key response tools go down. Utilizing cloud tools, like Slack and Opsgenie, minimizes the risk of an outage on your infrastructure taking down your response tools.

Adaptable: Things like integrations, workflows, add-ons, customization, and APIs all open up the possibilities behind the product. You may want to get started with an out-of-the-box configuration, but as your practices and processes mature,  you'll want your tools to be flexible enough to support changing needs.

Incident management cycle illustration

Before the incident

Monitoring

Monitoring systems let DevOps and IT Ops teams collect, aggregate, and trigger alerts off data coming from thousands of different services in real time. These are critical to providing full visibility into the health of your services and often trigger the first alarm bells during an incident.

Benefits

Monitoring tools give your team constant insight into the health of the infrastructure. Modern monitoring tools also proactively trigger alerts during unexpected activity.

Features

Feature set Questions to ask
24/7 coverage and analytics Does the tool have visibility into all my servers and infrastructures?

Can my team see real time analytics and dashboards and set alerting thresholds?
Integrates with alerting tools Does the product integrate with my alerting and on-call tool?

Service desk

Service desk software gives customers and employees a place to report incidents and potential incidents.

Benefits

Along with their many other use cases, (service requests, IT help desk) service desks empower your team to quickly learn about incidents from the people who matter most: your users and customers.

Features

Feature set Questions to ask
Enable self serve Can customers quickly file tickets through a service portal?

Can customers find the help they need with automated knowledge based suggestions?

Our recommendation: Jira Service Desk

Alerting and on call

Prompt and reliable alerting is a critical step in incident response. This is how teams make sure the right people are made aware of an incident.

Benefits

Alerting tools notify designated on-call responders through a sophisticated combination of scheduling, escalation paths, and notifications.

Features

Feature set Questions to ask
Works globally Can I send notifications (SMS, voice, email) to almost anywhere in the world.
Multiple notification methods Can I send notifications using multiple notification methods like email, SMS, phone, mobile app push, and try them multiple times?

Our recommendation: Opsgenie

During the incident

Team communication

Clear and reliable communication is undeniably critical during incident management.

Benefits

A solid communication platform helps teams communicate, share observations, links, and screenshots in a way that’s timestamped and preserved. This brings the right information and people together during an incident, and creates a rich record to learn from after the incident.

Features

Feature set Questions to ask
Multiple channels Can my incident response team quickly spin up a dedicated channel for an incident?
Integrations Can other tools in my incident toolchain post into my team's communication channel?

Our recommendation: Slack (text), Zoom (video)

Customer communication

Customer communication tools help keep customers informed during an incident.

Benefits

There’s no getting around it, incidents are typically a bad experience for your customers. Keeping customers informed builds trust and speeds up response efforts. Communicating with customers lets them know you’re aware of the incident and working on a fix.

Features

Feature set Questions to ask
Off of my infrastructure Will my communication tool be operational and accessible even if my internal infrastructure is down?
Subscribers and notifications Can customers opt in to get notifications when I post about an incident?

Our recommendation: Statuspage

Incident command center

An incident command center is wherever your canonical record of the incident and its key details live. This could be an incident tool like Opsgenie, or an issue tracking tool like Jira.

Benefits

A command center tool offers one place to get everyone up to speed during and after an incident, listing key details like incident status, associated alerts, updates, and more. It also provides a historical record of the incident and its associated response effort.

Features

Feature set Questions to ask
Source of truth Can team members and stakeholders quickly get up to speed on the incident?

Can team members and stakeholders use this record to locate all the other details of the incident and response activities?
Timeline Does the tool aggregate a chronological timeline of key events?

Our recommendation: Opsgenie

After the incident

Postmortem and analysis

Postmortems are a written record of what happened during the incident and any follow-up actions taken to prevent it from happening again.

Benefits

After an incident is resolved, teams still often don’t know the root causes and are at risk of the same incident happening again. Postmortems help to prevent that by bringing the team together for a post-incident analysis.

Features

Feature set Questions to ask
Templates Can my team use a template to fill out a postmortem?
Map out next actions Can my team plan out next actions and remediation work during a postmortem?

Our recommendation: Opsgenie

Issue tracking

An issue tracking tool helps the team map out future remediation work that needs to be done.

Benefits

In many cases, resolving the incident brings the service back online without addressing the root cause. Typically there is more engineering work that needs to be done in order to remediate root causes and make sure the incident doesn’t repeat itself. Issue and work tracking tools — which your team is hopefully already using for other development work — help make sure this work is prioritized and doesn’t fall through the cracks.

Features

Feature set Questions to ask
Shared workflow pipeline Can my team plan any incident remediation work alongside their other work and priorities?
Integrations Can my team pull in data and contect from my other incident tools?

Our recommendation: Jira Software

Up Next
KPIs