The transition to agile development and continuous deployment has resulted in the DevOps movement to break down organizational walls. While there are many benefits to this approach, some best practices of traditional IT Service Management (ITSM) have been lost in the transition. Which ITSM processes and controls are still relevant and how can you adapt them to the new agile world?
Agile development is all about speed, experimentation, and iteration. Releases are now being deployed to production on a daily basis by a lot of development teams. ITSM and ITIL, with their emphasis on planning, documentation, and functional silos, are often considered too slow to keep up in this new agile world.
However, there are risks to agile development that don’t incorporate enough process. These aren’t criticisms of the DevOps philosophy itself, but of DevOps implementations that lack enough service focus, documentation, and controls. Here are four areas of traditional ITSM that can be adapted to make DevOps successful.
Service Level Objectives
ITSM puts heavy emphasis on service-oriented goals and metrics. One danger of agile methodology is that this service focus can potentially be lost during the day-to-day drive to push out releases and finish the current short-term backlog. DevOps teams can gain a lot by looking at how ITSM frameworks deal with the creation and measurement of service metrics. Check out our recent blog about SLOs for DevOps. By setting the right objectives, and setting up monitoring and alerting, your team can be sure to keep the right focus on service delivery.
Change Management is one discipline of ITSM that people love to hate. It seems antithetical to the agile philosophy to submit detailed change requests and wait for a Change Advisory Board to review and approve. It slows everything down. The problem with ignoring change management is that incident responders don’t know about undocumented changes, the service desk and other stakeholders are out of the loop, and preventable issues slip through.
With a few improvements, change management can work at the speed you need.
1. Pre-approve low-risk changes and empower SREs to automate them or run them in response to an incident.
2. Automate the documentation of changes – integrate your tools so that ticketing and service desk software is updated automatically when changes occur.
3. Modernize the Change Advisory Board and replace it with a chat channel instead of a meeting. Again, integrate your tools so that Slack or Teams is updated with change details automatically.
Incident Management is another ITSM discipline that needs to be embraced by DevOps teams. With no documented incident response process, teams tend to get bogged down in firefighting. Responders jump from one alert to the next trying to keep up. Effectiveness tends to depend on who is on call during a particular incident, leading to inconsistent outcomes.
Taking a page from ITIL, it’s important to differentiate between “events” and “incidents”. Events are often alerts from monitoring tools that make you aware of something important. Whether or not constitutes an incident depends on whether a service is being disrupted or degraded in performance. You can only make that determination if you’ve defined your services and SLOs in advance (see #1 above).
On-call engineers should have a playbook for how to respond to different types of incidents, ensuring consistent, repeatable responses no matter who is on call. Where applicable, those playbooks can be changes that have been pre-approved (see #2 above). By all means, automate those playbooks steps wherever you can with tools like Ansible or Puppet.
Don’t have a documented incident response plan yet? Check out the Atlassian Incident Response Handbook for helpful ideas.
Stakeholders should be closely involved while planning, deploying, and operating services. Make sure your SLOs are aligned with stakeholder requirements and then integrate your DevOps tool chain to automate the process of keeping stakeholders informed. For every new release, your deployment tools should automatically update service desk tools so that your first line responders know about all changes. During incidents, your alerting and incident management tools should automatically update status pages, ticketing systems, and other stakeholder communications.
Opsgenie service-aware incident management
The only way ITSM can keep up is through automation. Opsgenie helps you automate:
- SLO measurement through monitoring and built-in reports
- Collaboration through ChatOps and Service Desk
- Alert triage and incident response with problem reporting and incident templates
- Stakeholder communications with push notification and Status Page integration
With Opsgenie’s various, bi-directional, ITSM integrations, you can bring IT Service Management into the DevOps age.