5 tips for incident management when you’re suddenly remote

In a new survey commissioned by Atlassian, nearly all of the respondents – 96 percent – stated that some or all of their employees have transitioned to working remotely due to the COVID-19 pandemic. This near-instantaneous shift to remote work has changed the way organizations manage incidents. Atlassian set out to understand how teams tasked with keeping the lights on have been affected by the recent changes.

Over 500 IT and development professionals across the U.S. shared their experiences in our first-ever State of Incident Management survey. Our intention was to better understand how organizations run their incident management processes, the impact of automation on their incident response, and their future plans and investments. Here we’ll dive deeper into the impact of the global pandemic on incident management and operating always-on services.

Demand and usage of services have increased

Almost overnight, offices shifted into homes. This had a profound impact on work schedules and productivity, making the continuous availability of tools like Zoom and Slack absolutely critical. Our survey found that 73 percent of respondents saw increased demand for the services they work on, and this is consistent with our own experience.

In March, we unveiled free cloud editions across our core products – Jira Software, Confluence, and Jira Service Desk. Following this, we saw new sign-ups increase by approximately 125 percent. Despite this growth in demand for cloud and remote services, users expect the same level of availability. Now more than ever, IT and development teams need tools to respond, resolve, and learn from incidents quickly so they can keep their services up and running. The teams tasked with this challenge are the unsung heroes of the services we depend on for work, socialization, and entertainment.

Delivery and response times have slowed despite demand

A little over half of survey respondents – 51 percent – reported that their incident response time has been slower since beginning to work remotely; the elevated demand is taking a toll on incident response teams.

This added pressure on systems and teams alike may have initially led to increased scrutiny of the frequency and rigor of deployments. Likely out of an abundance of caution during this time of uncertainty, 66 percent of respondents reported that their teams have slowed their frequency of software delivery. Teams are trying to add extra quality control by slowing their release cycles, but, as participants in our recent Ask-Me-Anything (AMA) panel discussed, this may actually introduce a higher risk of incidents due to slower feedback loops and bundled releases.

Speed of software delivery should be a focus

Engineering leaders from some of the world’s top companies shared their experience during these times during the Atlassian-hosted live video AMA – “SRE in the spotlight: Operating always-on services when the world is counting on you.”

The consensus across the panel, with top SREs from Slack and Honecomb, was that rather than slowing down releases, organizations should be increasing the frequency of releases by investing in their deployment hygiene and infrastructure.

“You should be shipping more often, not less often at this time.” Liz Fong-Jones | Developer Advocate, Honeycomb 

“There’s a false dichotomy between reliability and shipping features” Holly Allen | Engineering Director – Reliability, Slack 

Moving forward in the new normal

There’s no doubt that the current environment has stress-tested not only the ability of many companies to remain in business, but also the practices used to continuously ship and operate their services. We’ve seen demand for services increase during these times, and now isn’t the time to hold back. As we heard in the live AMA, some of the most innovative companies in the world are typically deploying dozens of times a day, and encourage organizations to continue doing so. This sense of agility has become increasingly important as industries try to keep up with the pace of change while consistently keeping their services online.

If you’d like to see how your company’s incident management process measures up to the teams we surveyed, we’re offering early access to the full report.

In the meantime, you can read our Atlassian Incident Management Handbook to learn more about how we’re responding to and managing incidents during these unprecedented times.

How companies are operating always-on services in the COVID-19 era