AIOps is the application of analytics and machine learning to automate some aspects of DevOps and IT operations management. Like all new technologies, it may take time to discover the best ways to get practical results. However, AIOps is already showing a lot of promise in three key use cases:
- Metrics and Visualization – DevOps and SRE teams depend on real-time metrics to help them understand the current health of their services. Analytics can help build more complex metrics that take into account the dependencies between systems and services, and provide visualizations that help DevOps personnel stay on top of things. Splunk ITSI’s glass tables are a good example of visualization.
- Logging and Anomaly Detection – Event logs contain a wealth of information if analysts have the time and skills to search for patterns and anomalous events. Machine learning algorithms can help IT teams automatically detect patterns in the data and trigger alerts when anomalies indicate a potential issue. For example, Elastic’s ELK X-pack delivers built-in machine learning for Logstash.
- Alert Correlation and Triage – Operations teams have more and more monitoring and alerting tools at their disposal. This has led to a flood of alerts, many of which are just noise. AIOps is being used as the first level responder to correlate and aggregate alerts before notifying human responders. BigPanda’s L0 autonomous layer is a good example of this application.
How is AIOps changing incident management?
Instead of directly interacting with monitoring and logging tools, Dev and Ops teams are starting to implement AIOps between the alert sources and the responders. The hope is that the analytics and insights provided by these additional investments will free up the human responders to focus on incident response and remediation instead of repetitive, manual tasks earlier in the alert cycle.
What’s not changing?
Dev and Ops teams still need to be notified about the right issues, at the right time, with enough information to take action. They also need to collaborate with subject matter experts, and have their disparate tools like chat, ticketing, and status pages working together during an incident.
AIOps can help deliver more actionable insights and alerts, but human responders still need to take action to resolve issues and keep services available.
Opsgenie + AIOps
Atlassian’s Opsgenie takes a best-of-breed approach to AIOps so you can choose the platform of your choice and easily pair it with Opsgenie’s modern incident management platform. Together with our partners we empower Dev and Ops teams to plan for service disruptions and stay in control during incidents.