Incident management for high-velocity teams
Pros and cons of different approaches to on-call management
The world depends on always on services more than ever before. An outage can affect millions of people, with real impact: They can’t pay their bills, they can’t book their flights, they can’t video call with their friends.
And whether you’re having a major bug, capacity issues, or you’re down completely, customers who depend on your services expect an immediate response. (The same is true for internal teams.)
Incidents can have a real impact not only in dollar terms — they cost businesses $700 billion per year in North America alone — but also on the reputation of your company, your product, and your team.
With so much at stake, teams have turned to on putting IT and developer teams on call to make sure the organization has the right people available to address a problem during an incident, no matter when one occurs.
A fair on-call schedule, coupled with an on-call compensation plan, can even foster a culture of shared responsibility and help your teams learn more about what it takes to make resilient software and services, making for a better overall product and fewer outages.