During development of Jira 3.13, one of our engineers set up a number of automated performance tests for Jira running every night in Bamboo. We also created a performance dashboard in Confluence to provide a summary view, allowing us to monitor performance of Jira over time. The performance tests fire up a Jira instance with real world data (we use data from http://jira.atlassian.com) and hit the web-interface using JMeter. To get a fairly realistic usage profile, we looked at the access logs for http://jira.atlassian.com for a couple of days and got JMeter to replicate this usage pattern, to simulate Jira running under load.
Once we released Jira 4.0, it became our new baseline for monitoring performance. When developing Jira 4.0.1 we noticed a pretty large performance regression (Blue represents Jira 4.0, Orange is Jira 4.0.x and Green is Jira 4.1.x.):
Uh oh! The Browse Issue page roughly 8x slower than Jira 4.0?!!
Read on for the investigation that followed and what caused this huge slowdown.
Our performance tests have two separate plans in Bamboo. One that’s profiled and one without profiling. The profiled run usually doesn’t produce very realistic results (due to the overhead from profiling), but it does produce a CPU snapshot that can be analyzed using jProfiler to figure out what’s causing slowdowns.
Investigation of such a snapshot showed that a lot of the performance regression was due to a bugfix in the dashboards plugin to avoid loopback requests (see the knowledge base article for more info on the bug). Local gadget specs were no longer being cached, plus all translations were now being inlined in the gadget specs resulting in significant contention on the ResourceBundle (used when looking up translations).
We first tried to fix this from Jira’s side by caching the i18n keys the GadgetProcessor (responsible for translating specs) was looking up and by creating I18nBeans in SAL using a caching I18nBeanFactory rather than creating new instances every time (which was expensive since all plugin i18n resources had to be looked up). Eventually however there was nothing more we could do in Jira, and the Atlassian Gadgets (AG) team in our San Francisco office did an awesome job in fixing the problem from their side. The AG plugin now caches local gadget specs quite aggressively which resulted in the following drop in response times:
Unfortunately we were still ~30% slower than Jira 4.0. After some more investigation, I found that in order to cleanup some code (and improve performance with JQL autocomplete incidentally) we changed a fairly heavily called method to use permissionManager.getProjectObjects() instead of the deprecated permissionManager.getProjects(). Turns out however that the older (GenericValue based) method was cached in a ThreadLocal, while the newer method was not. This has now been fixed and the result is pretty awesome:
If you look at the overall stats (particularly the 95 percentile) then Jira 4.0.1 looks to be about 20-30% faster than Jira 4.0. Also the Dashboard is where most of the improvement is now that we got rid of the loopback request on the server and that i18n requests are being cached much more aggressively:
Disclaimer: Whether or not Jira 4.0.1 is really ~20% faster for all of our customers is difficult to say. Our performance tests take a specific set of data with a specific set of requests and measure the performance under load. For a different set of data and usage pattern results may obviously vary. Also these results don’t mean that the Dashboard will be 50% faster for a single request, but it will be about that much faster on average when Jira is under heavy load!