Incident management for high-velocity teams
Calculating the cost of downtime
Understanding the financial impact of major incidents
In March 2015, a 12-hour Apple store outage cost the company $25 million.
In August 2016, a five-hour power outage in an operation center caused 2,000 cancelled flights and an estimated loss of $150 million for Delta Airlines.
In March 2019, a 14-hour outage cost facebook an estimated $90 million.
And those are the big guys. The industry leaders. The ones with fat operating margins and millions in the bank. They can weather a one-day financial storm. The truth is that while smaller companies may face smaller losses during a major incident, those smaller numbers can have an even bigger effect on their bottom line.
In fact, one study of 101 startups found that 29% of those that fail do so because they run out of cash. If startups are already at risk, it’s hard to imagine most could weather a major incident without going under.
The moral of the story: downtime is a big deal. Anyone who says otherwise hasn’t been paying attention. Incidents are not only potentially toxic to customer trust and loyalty. They’re also the financial grim reaper.
The average cost of downtime
The average cost of downtime is $5,600 per minute, according to a 2014 study by Gartner. The research firm is quick to point out, however, that this is just an average. An Avaya report the same year found that averages ranged from $2,300 to $9,000 per minute depending on factors like company size and industry vertical. And since 2014, that figure has been rising. A more recent report (from Ponemon Institute in 2016) raises Gartner’s average from $5,600 per minute to nearly $9,000 per minute.
For small businesses, that number drops to the lower-but-still-significant tune of $137 to $427 per minute. And where your company falls on this very wide spectrum depends on a number of factors, including industry vertical, organization size, and business model.
The industries with the highest risk include banking/finance, government, healthcare, manufacturing, media and communications, retail, and transportation/utilities. One 2016 study found that the average cost for downtime in these industries was upward of $5 million per hour.
Organization size is also a key factor. For Fortune 1,000 companies, downtime could cost as much as $1 million per hour, according to an IDC survey. And while the typical mid-sized company spends $1 million per year on incidents, large Enterprises can spend up to $60 million or more, according to a research report from IHS.
Finally, business models also factor heavily into downtime cost calculations. An eCommerce site with no physical sales locations obviously has more to lose from a web outage than a business with physical sales locations. The more your business model relies on uptime, the more (logically) you have to lose from downtime.
For eCommerce giant Amazon, whose entire business model relies on uptime, estimated costs are around $13.22 million per hour. Facebook—whose revenue depends on ad impressions—is likewise looking at figures well into the millions.
Quick downtime calculator
To get a quick estimate of your company’s probable downtime costs, use the following formula, based on the size of your business and the number of minutes your most recent incident lasted:
Downtime cost = minutes of downtime x cost-per-minute.
For small business, use $427 as cost-per-minute. For medium and large, use $9,000.
Understanding the full cost of downtime
When the average person thinks about downtime costs, they’re probably focused on lost revenue. Or perhaps a combination of revenue and employee productivity. But the truth is that the costs of downtime are much farther reaching.
According to independent data protection and security research firm, Ponemon, the largest share of downtime cost is business disruption—a category that includes reputational damage and customer churn. Revenue loss took second place in the firm’s research. And the third largest financial pain associated with incidents was end-user productivity.
Another common category of losses is lost internal productivity—of the IT team tasked with resolving your incident, of adjacent teams involved in incident management (like PR, social media managers, and customer service reps), and of other employees affected by the outage.
For software providers, SLA financial penalties, government fines (for any breach of regulatory requirements), and litigation and settlements are very real financial drains. And for companies dealing in physical products, depleted inventory is a significant risk.
That’s not to mention contractor costs, equipment replacement, and employee retention problems. After all, incidents cause stress. Stress creates unhappy workers. And unhappy workers leave. Experts estimate the cost of replacing an employee is 33% their annual salary.
How to minimize downtime costs
Figures like those above make it clear that minimizing downtime should be a priority for companies of any size and across all industries. So, how do we go about mitigating our downtime risks and minimizing the costs? Here are five tried-and-true ways:
Create a detailed disaster recovery plan
What will you do when downtime strikes? If you don’t already know the answer to that question, the default answer will be “waste precious time figuring out what to do.”
The better your incident response plan, the quicker and more effectively your teams will handle incidents. Which is why the first step of any new incident management program should be process and planning.
Communicate clearly and often
With business disruption accounting for a whopping 35% of downtime costs, it’s more important than ever to prioritize incident communication and customer service during and after incidents.
Eliminate single points of failure
Removing single points of failure from your existing infrastructure and processes is one of the quickest ways to reduce downtime and mitigate its costs. This means doing things like load balancing between servers, following good backup practices, and building peer review and technical fail-safes into your deployments.
There’s no 100% fool-proof way to avoid incidents. But that doesn’t mean you can’t minimize them.
In fact, the high cost of downtime is a good motivator for leadership to prioritize replacing outdated systems and security features and fixing issues before they balloon into full-blown incidents.
Don’t skip the postmortem
When downtime does strike (and in our complex, technical world, it always eventually does), the best way to prevent future outages is to have a strong postmortem practice.
An incident postmortem brings teams together to discuss the details of an incident: why it happened, its impact, what actions were taken to mitigate it and resolve it, and—importantly—what should be done to prevent it from happening again.
At Atlassian, our postmortems are blameless—focused on getting to the root of the issue instead of passing the blame. We’re also advocates of smart documentation, designed to sum up what we’ve learned during our postmortem and suggest improvements that will help us avoid repeating the issue we’ve just scrambled to correct.
Jira Service Management is built to help teams deal with incidents quickly so they can minimize the cost of downtime.
Learn incident communication with Statuspage
In this tutorial, we’ll show you how to use incident templates to communicate effectively during outages. Adaptable to many types of service interruption.Read this tutorial
The importance of an incident postmortem process
An incident postmortem, also known as a post-incident review, is the best way to work through what happened during an incident and capture lessons learned.Read this article