Atlassian developers are pretty big fans of Continuous Integration, and none more so than the Jira developers. We have in excess of 4000 acceptance tests, each of which restores a full database backup before going off and testing some aspect of Jira. Having this many tests gives us a lot of confidence that we haven’t broken anything while developing the latest features, but it has a downside: you can easily end up with a monster build that takes 6 or 7 hours to run.
Splitting the build
The natural solution to this problem is to employ a divide and conquer approach. That is, split up the tests into batches of approximately equal running time, and run these in parallel. Something like this:
Because each job runs in parallel and on a separate machine, the turnaround time is greatly reduced. We did this for the Jira continuous integration build using 20 parallel jobs, and it reduced the build time from 6 hours to around 1 hour. But even 1 hour is hardly ideal, and the diagram does highlight some rather glaring inefficiencies: the Checkout and Compile steps are being run too many times, and unnecessarily at that! It’s time for some build pipelining…
Pipelining the build
As with instruction pipelining on CPU’s, the idea is to divide the build into smaller jobs and store the result at the end of each job. Subsequent jobs can then access these previously-computed results instead of having to bootstrap everything from scratch. This is commonly known as artifact sharing or copying, and luckily our build server Bamboo supports it out of the box. It’s as easy ticking a checkbox:
The trickiest part is often to create a ZIP file that contains everything that is necessary to run the tests offline, so that the testing jobs run as quickly as possible. If you are using Maven like we are this is a bit more difficult than it should be, but it’s still possible if you don’t mind creating a custom Maven plugin. After some Maven wizardry and some juggling around of the build plan, we ended up with the following setup.
The testing jobs still run in parallel, but they no longer have to redo the checkout and compilation steps before they start testing. This change knocked a further 25 minutes off our build time, reducing the turnaround time to 35 minutes. That is still a considerable amount of time to wait, but it is getting closer to our goal of 10 to 15 minutes. Here’s what our continuous integration build looks like at the moment:
So by using build splitting and pipelining we have squeezed 374 minutes worth of tests into a 35 minute time period, reducing our continuous integration build’s turnaround time from 6 hours to 35 minutes. We also lessened the load on our Subversion and Maven repositories, since the 20 parallel jobs no longer need to checkout the code and download dependency JARs over and over again when using artifact sharing.
Do you have monster builds of your own that are keeping you up at night? What strategies have you found for keeping them in check?