Incident management for high-velocity teams
Understanding and fighting alert fatigue
In 2013, a 16-year-old boy at one of the US’s top hospitals was given a 3800% overdose of his medication.
The hospital’s built-in alert system noticed the overdose order and sent alerts to a doctor and a pharmacist. And yet, a short time later, the overdose was administered and the seizures, full-body numbness, and struggle for the boy’s life began.
Download our on-call book
Learn how to create and implement an effective program in this essential guide
How could this happen—especially when the safety system caught the problem before the medication arrived at the boy’s bedside?
The answer is alert fatigue.
Both the doctor and the pharmacist ignored the system’s alert because that same system generates alerts for about 50% of the hundreds of prescriptions they deal with each day. They’d learned that most of those alerts were false alarms, and, as a coping mechanism, they’d started giving them a cursory glance at best.
And so a boy who should have taken a single pill took 38. And while he ultimately survived, the consequences to his health were significant.
Stories like these are common—and too often fatal—in hospitals and the aviation industry. In fact, a 2013 survey found that 19 out of 20 hospitals rank alert fatigue as their number one safety concern.
And while the risks are different, alert fatigue is also common for IT and DevOps teams as they monitor the always-on technology that drives our businesses.
What is alert fatigue?
Alert fatigue—also known as alarm fatigue—is when an overwhelming number of alerts desensitizes the people tasked with responding to them, leading to missed or ignored alerts or delayed responses.
The main problem, according to most, is the sheer number of alerts. A single alert is easy to respond to, even if it interrupts the normal work or free time of an on-call employee. A dozen alerts in succession is harder. And the higher the number climbs, the more likely it is that an employee will miss something important.
This issue is compounded by the fact that many alerts are false alarms. In the medical industry, research shows that anywhere from 72 – 99% of all clinical alarms are false. In security, one survey found that 52% of alerts were false and 64% were redundant.
This high number of false alerts trains workers to assume most alerts will be false and act accordingly, just as the doctor and pharmacist above both closed out the system overdose alert, assuming it was yet another insignificant alarm.
The psychology of alert fatigue
Alert fatigue is one of the top 10 safety concerns of hospitals because mentally shutting out frequent alarms is a typical psychological response to an overwhelming number of alerts.
The reason for this is what we call normalization, desensitization, or habituation—three concepts that essentially mean the same thing: The more you’re exposed to something, the more you tolerate, normalize, and ignore it.
This applies to both work and life outside work. For example, romantic movies with overly persistent male leads measurably affect women’s tolerance for stalking behavior in real life. Normalization of apathy about the primary O-ring on the Challenger space shuttle led to the Challenger explosion in 1986. And when Arizona’s Petrified Forest National Park put up signs to discourage people from stealing the park’s petrified wood—the signs backfired, normalizing the theft and increasing it.
Just like an endless stream of prank calls might lead you to block a number or turn off your phone, an endless stream of false, redundant, or unimportant alarms often leads to ignoring them. It’s human nature.
And it’s not just normalization of alerts in general that’s at work here. Repetition of the same alert causes even greater alert fatigue. One study found that, for clinicians, the likelihood of accepting an alert dropped 30% for each reminder.
The risks of alert fatigue
Missed or ignored alerts
As in the example of the unfortunate hospitalized teen above, the greatest risk of alert fatigue is missed or ignored alerts. When an alert system has cried wolf too many times or doesn’t differentiate dangerous alerts (such as a 3800% overdose) from minor alerts (such as a .1% overdose), workers condition themselves to pay less attention to those alerts.
In DevOps and IT Ops, this can lead to more incidents and major consequences in both revenue, cost, and brand reputation.
Slow response times
Alert fatigue also impacts response times. Even if alerts aren’t missed or permanently ignored, they may be temporarily ignored. After all, if the last 10 alerts that came in were false alarms, is an on-call employee equally as likely to abandon their dinner or sleep for the 11th alert as they were for the first? Or might they justify finishing dinner first?
Constant alerts, sleep interruptions, and full inboxes are a recipe for employee burnout and can lead to higher turnover, lower job satisfaction, and lower productivity.
How to avoid alert fatigue
Alert fatigue is a significant problem across a variety of industries—and one that comes with some dire consequences. So, how do we avoid the ignored alerts, slow response times, and employee burnout? Experts point to alert processes and policies themselves as the way forward.
Set intelligent thresholds
One way to prevent alerts from overwhelming your on-call professionals is to set intelligent thresholds for them. The key question here is this: Does every alert need immediate attention? Are all alerts created equal? Which issues require an immediate alert and which can be dealt with during normal working hours?
The answer is always a balancing act. Because too few alerts can mean missed incidents, but too many can also lead to missed incidents through alert fatigue.
The balancing act is a tough one for any tech company, but without attempting to find that balance, systems usually err on the side of too many alerts and create situations like the one that led to the 3800% overdose.
Aviation seems to be leading the way in successfully fighting alert fatigue and part of the reason is that they’ve set their thresholds high. The computer may track over 10,000 data points, but the percentage of flights with any alerts at all—even minor ones—is below 10%.
As Captain Chesley “Sully” Sullenberger points out in an article from Medium Backchannel:
“The warnings in cockpits now are prioritized so you don’t get alarm fatigue...We work very hard to avoid false positives because false positives are one of the worst things you could do to any warning system. It just makes people tune them out.”
Set tiered alert priorities
If not all alerts are created equal, they shouldn’t show up equally in a physician’s approval form, a developer’s inbox, or a pilot’s dashboard. Setting alert priorities and using visual, audible, and sensory cues to indicate importance can reduce alert fatigue by a large margin.
In the case of the 3800% overdose, a big part of the problem was that the system had very low alert thresholds and every alert was given equal priority. A .1% overdose alert looked the same as a 3800% overdose alert. And with 50% of the medication requests generating these alerts, clinicians had learned to ignore them all.
Again, the aviation industry sets a good example of not only aggressively setting tiers for their priorities, but also clearly indicating priority with a variety of visual and sensory cues. The only time a red alert—with red lights, a red text message, a voice warning, and a vibration in the steering mechanism—comes up on a pilot’s dashboard is if the plane is in immediate danger of a stall and the pilot must take action right away. No one wants these alerts ignored and so they get their own special category.
Other alerts—even alerts that sound alarming to those of us who fly often, like an engine fire or loss of cabin pressure—are downgraded to warnings (events that will impact a plane’s flight path), cautions (events that require immediate pilot awareness, but may not require immediate action), and advisories (events where no action is required, but the pilot should know something happened).
As the importance of the alerts drops, so too do the visual, audible, and sensory cues around that alert. Warnings merit red lights, text messages, and voice alerts (though not a shaking steering mechanism). Cautions generally trigger amber lights and text messages. And advisories are amber text messages—no lights.
Pilots know instantaneously, based on these cues, which alerts need priority attention and which can be reasonably ignored for a moment if they’re dealing with another important task or a series of alerts that need to be prioritized.
Make sure alerts are actionable
Vague alerts require more focus, attention, and time than specific, actionable alerts. For workers who are already fatigued by the sheer number of alerts, requiring more focus and attention is a recipe for low productivity and missed alerts.
This is another place where we can learn from the aviation industry. For each kind of alert that shows up on the pilot’s dashboard, there’s also an actionable checklist that matches up with the alert.
Consolidate redundant alerts
Redundant alerts are one of the major culprits in alert fatigue. One study found that for every reminder of the same alert, attention by the alertee dropped 30%. Another study found that over 60% of all alerts in security systems were redundant.
Consolidating these alerts and reducing reminders where possible can help keep the alert load more manageable, leading to better attention from workers.
Create balanced schedules
Even with intelligent thresholds and tiered systems, companies (especially large companies) may be handling a high number of alerts.
Once you have systems optimized, it’s also valuable to look at process and people. Do you have enough on-call professionals? Is the burden of alerts falling too heavily on one person or team and can that burden be shared? How frequent are alerts? Are there certain times that need greater or less coverage?
The typical DevOps professional uses at least five tools to get to the bottom of performance issues. This means multiple alert locations, styles, and types. It also means a lot of duplicate work. If each of the five systems has a similar alert, you’ve effectively increased your alert review workload by 500%.
The more you can consolidate alerts and information, the more you can reduce the fatigue of sorting through those alerts and the accompanying information.
Prioritize continuous review and improvement
There is no one-time, one-size-fits-all fix to alert fatigue and the dangers that come with it. It’s essential to review your processes, alerts, and systems regularly to make sure you’re striking the right balance.
Are alerts getting missed? If so, why? Have you set your thresholds too high or too low? Are visual cues not working? Have workers normalized the alerts—and would changing their design increase attention? These questions—and others like them—should be regularly revisited.
Explore the consolidating benefits of Jira Service Management with Opsgenie and discover how alert flexibility and customization features work in action.