Identify what your team values most during incident response and create a plan to live those values consistently.
USE THIS PLAY TO...
Uncover what really matters to your team during an incident.
Build out a prioritized plan for living your ideal values.
If you're struggling with Health Monitor, running this play might help.or on your
5 - 10
Running the play
Your team can run this play either quarterly or biannually to track progress on living out your ideal incident values.
Whiteboard or butcher's paper
Prep (30 mins)
Book a room 30-60 minutes early. Write these 5 basic incident values on a whiteboard or butcher paper.
Draw a long line below to each value that will act as a sliding scale. Number the with 6 notches of equal size, 0-5.
We know there is a problem before our customers do.
Escalate, escalate, escalate (and communicate with customers).
$#!% happens, clean it up quickly.
Never have the same incident twice.
(optional) Copy this Trello template and invite play participants to join the board so they can reference more information about each incident value, take notes during the play, and record goals/action items at the end.
Note that these are the values that Atlassian incident management teams have converged on. Feel free to copy them verbatim. Or, personalize them for your team/organization. This exercise is all about deciding what your team values.
It's key to have folks from different roles represented (support, dev, etc.) so there are different perspectives coming together in conversation that might not otherwise take place.
Set the stage (5 min)
Welcome everyone and establish the rules of engagement:
- Embrace a positive spirit of continuous improvement and share whatever you think will help the team improve.
- Don't make it personal, don't take it personally.
- Listen with an open mind, and remember that everyone's experience is valid (even those you don't share).
Introduce the basic incident values (see above) and describe each one you have written up during prep on your whiteboard/butcher paper.
If these incident values don't quite fit your team, take a few minutes now to add, remove, or adjust them.
Talk it out (25 min)
The real meat of this play comes from discussing why people rated the team on each value the way that they did. Guide the discussion using the questions below. Feel free to add other questions, too.
- Which values have the most consensus when it comes to ranking? Why do you think that is?
- Which values have the most discrepancy when it comes to ranking? Why do you think that is?
- Any major outliers? Give the people who placed outliers an opportunity to explain the thought behind it.
- What values do you need to work on?
Write down any issues you uncover (i.e. "We never detect incidents before our customers do" or "We usually blame someone during our post incident reviews") on the whiteboard or on a poster board next to your sliders.
Take a picture of your sliding scales so you can remember where you ranked during this exercise and compare if you run this play again in the future.
It could be that you don't actually value the values you are ranking low on. Decide whether you want to improve these values or ditch them in favor of something you care more about.
Boost your values (10 min)
Referring back to the list of issues you uncovered, discuss how you can better live your values. Use questions like these to guide the discussion:
- How can we improve our monitoring and alerting systems so we find out about incidents before our customers do?
- What tools or processes are we lacking right now?
- What is not working about our current escalation process?
- How can we share the burden of middle-of-the-night alerts and escalation work?
- How do we train new members of the team on our escalation process?
- How are we keeping our customers in the loop during an incident?
- Who is ultimately responsible for incident recovery?
- How long does it usually take for us to resolve incidents?
- How do we determine is an incident requires a post-mortem or post-incident review?
- If you were a customer of your service, would you be satisfied with the level of detail we give out about incidents?
- Who is responsible for mitigation post-incident?
- How do you hold each other accountable to making sure the same bug doesn't bite twice?
Be sure to run a full Health Monitor session or checkpoint with your team to see if you're improving.
Leadership or project teams
Congruent values are important for all teams in an organization. Leadership and Project teams can run this play by adjusting the values to reflect the work and culture you aspire to. Tweak the value names and descriptions to make it relevant for any team at any org.
Share your results and action plan with your broader team/company to stay accountable.
Re-run this play in a few months to see how you did with achieving the goals in your action plan.
Want even more Playbook?
Drop your email below to be notified when we add new Health Monitors and plays.
Drop a question or comment on the Atlassian Community site.