Incident management for high-velocity teams

Understanding the key incident response roles and responsibilities

An incident is no time to have multiple people doing duplicate work. It’s also a terrible time to have important tasks ignored, all because everyone thought somebody else was working on it. Incidents are made worse when incident response team members can’t communicate, can’t cooperate, and don’t know what each other is working on. Work gets repeated, work gets ignored, customers and the business suffer.

That’s why effective incident response teams designate clear roles and responsibilities. Team members know what the different roles are, what they’re responsible for, and who is in which role during an incident.

Here are a few of the most common incident management roles. Several of them, like major incident manager, are key to our own incident response strategy.

Role: Incident manager

Primary responsibility: The incident manager has the overall responsibility and authority during the incident. They coordinate and direct all facets of the incident response effort. As a rule of thumb, the incident manager is responsible for all roles and and responsibilities until they designate that role to someone else. At Atlassian, the incident manager can also devise and delegate ad hoc roles as required by the incident. For example, they could set multiple tech leads if more than one stream of work is underway, or create separate internal and external communications managers.

Secondary responsibilities: Everything someone else isn’t assigned to.

Also known as: Incident commander, major incident manager

Role: Tech lead

Primary responsibility: The tech lead is typically a senior technical responder. They are responsible for developing theories about what's broken and why, deciding on changes, and running the technical team during the incident. This role works closely with the incident manager.

Secondary responsibilities: Communicate updates to incident manager and other team members, document key theories and actions taken during the incident for later analysis, participate in incident postmortem, page additional responders and subject matter experts.

Also known as: On-call engineer, subject matter expert

Role: Communications manager

Primary responsibility: The communications manager is the person familiar with public communications, possibly from the customer support or public relations teams. They are responsible for writing and sending internal and external communications about the incident. This is usually also the person who updates the status page.

Secondary responsibilities: Collect customer responses, interface with executives and other high-level stakeholders.

Also known as: Communications officer, communications lead

Role: Customer support lead

Primary responsibility: The person in charge of making sure incoming tickets, phone calls, and tweets about the incident get a timely, appropriate response.

Secondary responsibilities: Pass customer-sourced details to the incident-response team.

Also known as: Help desk lead, customer support agent

Role: Subject matter expert

Primary responsibility: A technical responder familiar with the system or service experiencing an incident. Often responsible for suggesting and implementing fixes.

Secondary responsibilities: Providing context and updates to the incident team, paging additional subject matter experts.

Also known as: Technical lead, on-call engineer

Role: Social media lead

Primary responsibility: A social media pro in charge of communicating about the incident on social channels.

Secondary responsibilities: Updating the status page, sharing real-time customer feedback with the incident response team.

Also known as: Social media manager, communications lead

Role: Scribe

Primary responsibility: A scribe is responsible for recording key information about the incident and its response effort.

Secondary responsibilities: Maintain an incident timeline, keep a record of key people and activities throughout the incident.

Role: Problem manager

Primary responsibility: The person responsible for going beyond the incident’s resolution to identify the root cause and any changes that need to be made to avoid the issue in the future.

Secondary responsibilities: Coordinate, run, and record an incident postmortem, log and track remediation tickets.

Also known as: Root cause analyst

Up Next