The nature of always-on services requires continuous response from agile and DevOps teams. These teams need to think beyond reacting to a single incident and align the team structure, values, and tools to ensure that operational excellence becomes a core competency.
Today’s users expect modern services to be always-on and always available. Downtime can be detrimental, causing damage to reputation and bottom line, with the average costs of downtime as high as $9,000 per minute.
In a cloud-native world, however, incidents are as much a fact of life as bugs in code. Incidents that result in downtime range from hardware and network failures to misconfiguration, resource exhaustion, data inconsistencies, and software bugs.
Always-on services require teams to think beyond reacting to a single incident and align the team structure, values, and tools to ensure that operational excellence becomes a core competency. It entails adopting the practice of you build it, you run it (YBIYRI), where the ownership of building, testing, deploying and operating a service rests with the development team. This concept puts DevOps theory into practice, and reinforces the cycle of continuous deployment, feedback, and maintenance or incident response that teams need to keep always-on services, always on.