The Atlassian Support team is happy to welcome a new member of our family: Hercules, the Atlassian Support Robot! We recently turned him loose on our support instance of Jira, where he looks through uploaded logs and returns known issues from an analysis. This post will describe some of the problems he solves and how he works.
The Challenge: Diagnosing Previously Known Issues
A good support engineer can spend a lot of time working to rediscover a known problem. Support engineers pour through logs, find the one important line amidst the noise of other activity in the log files, and then search for existing bugs or Knowledge Base articles. Our engineers have to be really good at searching Jira and Confluence as a result.
If an engineer encounters the same problem a second time, at best they remember what they’re looking for and are able to start searching directly for the solution, which can save them a few minutes. If the engineer doesn’t remember the solution, they go through the same research process, often with a nagging sense of deja vu. The second engineer to encounter the problem has none of that history, and has to start from scratch. What we needed was a way to identify known problems and flag them up front when they happen again.
Hercules to the Rescue
A lot of problems result in pretty consistent log messages. Engineers learn to recognize some of these connections almost without thinking: got a customer reporting a product hanging or behaving erratically? Just search the logs for OutOfMemory and you’ll often find the culprit straight away. Hercules takes this “aha” moment, this insight into the connection between log messages and problems, and automates it.
When an engineer encounters a problem that results in a clear log message (stack trace, et cetera), they create a regular expression, a pattern that can be used to search logs for a variation on the same message, and link this pattern to a relevant KB article or bug report. Hercules uses these patterns to scan all of the log files submitted to support, and comes up with a list of known problems.
These results are used in a number of ways. First, the top results are automatically mailed to our customers (this is built on top of Jira notifications), giving them a list of solutions, and often allowing them to resolve their own case well before an engineer is available to help them. Even when the top problems aren’t relevant, the customer is able to research those and rule them out, so that their conversation with our support engineer is a lot more focused.
When our support engineers pick up tickets, they see a report detailing the matches Hercules has found. Because it’s using Regular Expressions and analyzing Java stack traces, Hercules’ answers should be highly accurate: it’s matching an exact line of code or logged error, which means a support engineer can spot an issue much more quickly than by trolling through logs. Hercules also has “thumbs up, thumbs down” voting on matches, so that over time we can refine the patterns we use and the articles returned to be even more accurate.
We’d like to get Hercules into a state where we can distribute it for customers as a Jira plugin. If you’re interested in getting a copy and using it for your own product support, vote for the feature.
We’re also building tools that let you use Hercules and our database of known problems and patterns from within your local instances of our products. Stay tuned for more news on that.
Want to keep up with Hercules? Friend him on his Facebook page!