Incident management for high-velocity teams
What is incident management?
Incident management is the process used by DevOps and IT Operations teams to respond to an unplanned event or service interruption and restore the service to its operational state.
At Atlassian, we define an incident as an event that causes disruption to or a reduction in the quality of a service which requires an emergency response. Teams who follow ITIL or ITSM practices may use the term major incident for this instead.
Get our Incident Management Handbook in print or PDF
We've got a limited supply of print versions of our Incident Management Handbook that we're shipping out for free. Or download a PDF version.
An incident is resolved when the affected service resumes functioning in its intended state. This includes only those tasks required to mitigate impact and restore functionality.
These types of incidents can vary widely in severity, ranging from an entire global web service crashing to a small number of users having intermittent errors.
Incident management topics
Want to see how Atlassian handles major incidents? We’ve published our internal incident management handbook. Anyone is welcome to learn from it, adapt it, and use it however they see fit.
Setting up an on-call schedule with Opsgenie
In this tutorial, you’ll learn how to set up an on-call schedule, apply override rules, configure on-call notifications, and more, all within Opsgenie.Read this tutorial
Pros and cons of different approaches to on-call management
On call teams are rapidly evolving. Explore the pros and cons of different approaches to on call management.Read this article