Goodbye Subversion, Hello Mercurial: A Migration Guide

Published March 24, 2011 in Archives

Jason Hinch

Migrating to Mercurial may seem daunting. Find out how The Fisheye and Crucible team migrated their repository and learn from our experience

In my previous post, I discussed some of the factors influencing Atlassian to move away from Subversion and start using Mercurial. This post will be getting into the nitty gritty of how to migrate an existing project’s Subversion repository and how to prepare your development team for the move.

Setting some Goals

The Fisheye and Crucible team had a few goals for the migration to Mercurial.

Bring our history with us. The team wanted the Mercurial repository to resemble it’s Subversion counterpart as much as possible. We wanted to have the all the history contributing to our development branch (trunk) as well as our two supported release branches (2.2 and 2.3 at the time). Other branches such as ShipIt branches and 20% time were less important but ideally converted as well. We were happy to exclude accidental commits and commits of large files from the conversion.
Tool Integration. We wanted to make sure the tools and software that we used for development provided the same level of integration with Mercurial as they do for Subversion. These were mainly Fisheye, Crucible, Bamboo, Eclipse and Intellij IDEA.
Incremental. The conversion should be performed incrementally. This was incredibly important in order to ensure that the history was how we expected it, system integration worked and everyone could perform their development tasks without running into problems due to Mercurial. This was also important in minimising the disruptions that would occur for developers.
Replicate team process. At least initially, we wanted to replicate the current Subversion development workflow using Mercurial. We were happy the experiment with other workflows which aren’t possible in subversion after the initial migration.

The right tool for the job

Mercurial comes bundled with the command hg convert. This can be used to migrate many different types of SCM repositories to a Mercurial, including Subversion. We followed the helpful guide that is provided on the website. At first it looked quite promising. The conversion didn’t produce any errors. But as we started to to look at the resulting Mercurial repository some things just didn’t look right.

We decided to do a script based comparison of the Mercurial and Subversion heads of each branch and tag. We found that many tags and branches were very different and on top of this, Mercurial’s incremental conversions of new commits were taking at least 3 minutes. This was far from acceptable as we were used to committed changesets being available almost instantly in Fisheye/Crucible.

We spoke to a few members of the Bitbucket team and they suggested that we give hgsubversion a try. The conversion using this tool was far more accurate and was faster at doing incremental conversions.

Converting using hgsubversion

Step 1: Create an SVN mirror

Regardless of the tool you decide to use for the conversion the first thing you should do is to set up a local mirror of your Subversion repository. The conversion tools will make numerous calls to repository for data and you will most likely perform more than one conversion.

[cc lang=’bash’]
svnadmin create fe-mirror # we need to enable the pre-revprop-change hook for sync to work
echo ‘#!/bin/sh’ > fe-mirror/hooks/pre-revprop-change
chmod +x fe-mirror/hooks/pre-revprop-change
svnsync init file://`pwd`/fe-mirror https://svn.example.com/svn/FECRU
svnsync sync file://`pwd`/fe-mirror
[/cc]
The initial synchronisation will take a while and is dependent on the size of your project as well as the size of your repository. For example the Fisheye team checks in jar dependencies. This meant that every version of those jars over time needed to be synchronised. Also if there are multiple projects in your SVN repository, the synchronisation needs to go through all the revisions which includes revisions which weren’t made by your project. These are later ignored by the conversion process but are still important in the Subversion repository. Fisheye/Crucible repository took a few days for the initial sync.

Step 2: Create author mappings

Mercurial (as well as most Distributed Version Control System products) has a different standard for author names than Subversion. In Subversion author names are controlled centrally and their structure is dictated by the administrator for that repository. DVCS does not enforce such restrictions (due to people being able to commit to their local repository) so a recommended standard was used Full Name <Email address>. For example: John Doe<jdoe@example.com>. To get a list of all authors in your repository execute the following:
[cc lang=’bash’]
svn log file://`pwd`/fe-mirror/FECRU –quiet –xml | grep author | sed -E “s:::g” | sort | uniq
[/cc]
Both hgsubversion and hg convert use a file to map between the subversion author name and the new Mercurial author name. The file uses the following pattern:
[cc lang=’bash’]
jdoe=John Doe <jdoe@example.com>
msmith=Mary Smith <msmith@example.com>
…
[/cc]
hgsubversion also provides a default host for repositories where the author in subversion maps directly to an email address in with the same user portion.

Step 3: Create an initial Mercurial conversion

Provided you have hgsubversion installed and on your PYTHONPATH, you should be able to convert the subversion repository. You have to make sure you include the path to the project in the mirrored repository url as well.
[cc lang=’bash’]
hg clone –config extensions.hgsubversion= –config hgsubversion.authormap=usermapping.txt \
–config hgsubversion.defaulthost=atlassian.com file://`pwd`/fe-mirror/FECRU fe-mirror-hgsvn
[/cc]
The initial conversion to Mercurial for Fisheye/Crucible took a few hours, but other projects in Atlassian only took a few minutes.

Step 4: Sanitising the Conversion

In the history of most projects, someone always ends up screwing up a repositories history. Copying the wrong directory to the wrong location, adding massive log files or binaries into the repository. The list goes on. These mistakes you don’t necessarily want in your new repository, so you might want to exclude them from the history. We do this through creating a file mapping file. It looks something like this:
[cc lang=’bash’]
exclude crucible-1.0-beta
exclude crudev-1.1
exclude trunk
exclude etc/clover/hist
exclude etc/clover/beac
[/cc]
You can choose to include or exclude certain paths. In the above instances, we’re not excluding the trunk, we are excluding a time when someone copied the trunk directory into a branch directory instead of the contents of trunk contents. Another example from above is when someone committed the clover results from our bamboo server into the repository. These consumed a lot of space and weren’t needed. If you don’t want to look through your entire history, you can look into Mercurial’s internal data storage and see potential files for exclusion. The following command will list the files in descending size order:
[cc lang=’bash’]
cd fe-mirror-hgsvn
du -ak .hg/store/data | egrep -e ‘*.i$’ -e ‘*.d$’ | sort -rg | less
cd ..
[/cc]
Which will look something like this:
[cc lang=’bash’]
49132 .hg/store/data/lib/maven-dependencies-build
38140 .hg/store/data/lib/maven-dependencies/fastutil-6.1.0.jar.d
31092 .hg/store/data/test/svnrepos/checkstyle.zip.d
[/cc]

Step 5: Creating a Working repository

Now that we have a user mapping file and an file mapping file, we clone the subversion repository again:
[cc lang=’bash’]
hg clone –config extensions.hgsubversion= –config hgsubversion.filemap=filemapping.txt \
–config hgsubversion.authormap=usermapping.txt \
–config hgsubversion.defaulthost=atlassian.com file://`pwd`/fe-mirror/FECRU fe-mirror-hgsvn
[/cc]
This can be considered the full conversion. You may just want to use this repository, but the Fisheye/Crucible team found that this repository:

Contained many branches that we were not interested in for day to day development
Contained conversions of complex tags which created large revisions which weren’t part of the mainline development
Was large. The full conversion for the Fisheye/Crucible repository was 2Gb. This is because we commit jar files to our repository which take up a lot of space. Other repositories converted to Mercurial weren’t nearly as big.

We decided to create a working repository with a subset of the branches.

Step 5a: Excluding unrelated and irrelevant branches

Subversion does not model branches in a repository as a first class entity, but rather as a subdirectory. This means that branches can easily be created incorrectly and the converter sometimes is not able to correctly identify the branch. As a result you sometimes end up with multiple, unrelated histories in your repository. There were no occurrences of these during the conversion of the Fisheye/Crucible repository but there were definitely branches we didn’t care about having in our working repository. We decided to keep three branches; Our main development branch (default) and the two most recent support branches (2.2 and 2.3).
[cc lang=’bash’]
mkdir working-repository
cd working-repository
hg init hg pull -r default ../fe-mirror-hgsvn
hg pull -r 2.2 ../fe-mirror-hgsvn
hg pull -r 2.3 ../fe-mirror-hgsvn
cd ..
[/cc]

Step 5b: Closing non-topological heads (optional)

Additional heads revisions on branches are sometimes created during the conversion process. A good example of when this would happen is when someone creates a branch from a tag of a project. Mercurial has the ability to mark these heads as closed. To get a list of these heads we compare the list of most recent commits on all branches to the list of all heads:
[cc lang=’bash’]
cd working-repository
hg branches | egrep -o “:[0-9a-f]{12}( (inactive))?$” | egrep -o “[0-9a-f]{12}” | sort | uniq > branch-heads.txt
hg log -r “head() and not closed()” –template “{node}n” | cut -b 1-12 | sort | uniq > non-closed-heads.txt
diff branch-heads.txt non-closed-heads.txt
cd ..
[/cc]
To close these heads we first update to them then commit using the --close-branch flag
[cc lang=’bash’]
cd working-repository
hg update CHANGESET
hg commit -m “Closing head” –close-branch
cd ..
[/cc]

Step 6: Rinse and repeat

In order to do this incrementally, you will need to first synchronise the subversion mirror again:
[cc lang=’bash’]
svnsync sync file://`pwd`/fe-mirror
[/cc]
Then pull the latest changes into the full conversion.
[cc lang=’bash’]
cd fe-mirror-hgsvn
hg pull –config extensions.hgsubversion=
cd ..
[/cc]
You may need to close any newly created heads, which can happen happen under the conditions mentioned in step 5b but this is unlikely. Finally pull any changes made on the important branches into the working repository.
[cc lang=’bash’]
cd working-repository
hg pull -r 2.2 ../fe-mirror-hgsvn
hg pull -r 2.3 ../fe-mirror-hgsvn
hg pull -r default ../fe-mirror-hgsvn
[/cc]
For the Fisheye/Crucible migration, we used a shell script to automate this process which ran every minute.

All systems are go

In addition to converting the repository to Mercurial we also needed to set up our development environments for Mercurial development. The default Mercurial distribution for Mac OS X was used for mac users and either Machg or Sourcetree for GUI clients. Tortoisehg was used for Windows which bundles the Mercurial command line tool as well as a GUI. We use Eclipse and Intellij IDEA for development. MercurialEclipse works well and is the defacto standard for Mercurial integration for Eclipse. IDEA X supports Mercurial through their hg4idea plugin. Using the combination of these tools we saw little to no difference in the tool usage. Many operations are actually faster due to them interacting with a local repository as oppose to over a network connection.

Integrating with our products was fairly straight forward. In Fisheye/Crucible 2.5 and Bamboo 2.7+ you can add a Mercurial repository using ssh which works well with Bitbucket. However to minimise the amount of data being sent over the internet we set up clone of the repository inside the network our Bamboo and Fisheye servers are. We were able to look at diff, annotations, build results and see if they were consistent with each other.Prior to the move, all the developers took a bit of time to clone the repository and set up their development environment in order to make the sure it all works as they expect.

On the actual day that we moved over, it was very little difference in the routine due to all the preparation before the conversion. The only difference was that in the morning, everyone pulled in the most recent changes from the central Mercurial repository. After that it was business as usual. No development speed was lost which was a huge win for the Fisheye/Crucible. A big thanks to Matt Watson for converting the repository, preparing the team for the migration as well as supplying a lot of the information in this blog.

What’s Next

In my next post, I’ll be talking about how the Fisheye & Crucible team reacted to the switch to Mercurial. What was different? What did we learn? Stay tuned.