Confluence moved from Subversion to git 3 weeks ago and we were finally in a good position to merge a fairly significant reorganisation of our source code into master.

Maven artifact and directory names now match and it is possible to rebuild Confluence without having to rebuild the world.
reorg-stable-master.jpg
Our architect Charles had this in the pipeline for a while but decided to wait until after we moved, as SVN isn’t known to handle situations like this very well:

The moral of this story is that until Subversion improves, be very careful about merging copies and renames from one branch to another.

http://svnbook.red-bean.com/en/1.7/svn.branchmerge.advanced.html
Although git’s support for handling renamed files during a merge is much better, there are still a few caveats when it comes to merging changes across branches with lot of renamed files. To avoid laborious manual merge resolution it is worth having a closer look at how git handles file renames.

Rename handling in git

The most important thing to understand is that git does not even track renames. Although it sports a git mv command, this is simply a more convenient way of removing the old and adding the new path to the index. The fact that a directory or file was renamed does not get explicitly recorded in git. git simply detects renames after the fact. This works pretty well for most cases but there are a few things to watch out for, especially when it comes to a project with a considerable amount of tracked files (~ 8770 in our case).

Merging changes across

Consider merging a change from stable to master. On stable the change affects a file in the old location:

[cc escaped=”true” line_numbers=”0″]
$> git show f795d7b
commit f795d7b3fdda5f389baf66af9793b029af30b07a
Author: Stefan Saasen <devnull@atlassian.com>
Date: Thu Oct 6 11:12:37 2011 +1100
CONF-23398: Bumped jira-connector from 1.2-beta3 up to 1.2-beta4
diff –git a/confluence-bundled-plugins-library/pom.xml b/confluence-bundled-plugins-library/pom.xml
index b94531f..4f5fbe2 100644
— a/confluence-bundled-plugins-library/pom.xml
+++ b/confluence-bundled-plugins-library/pom.xml
@@ -314,7 +314,7 @@
<dependency>
<groupId>com.atlassian.confluence.plugins.jira</groupId>
<artifactId>jira-connector</artifactId>
– <version>1.2-beta3</version>
+ <version>1.2-beta4</version>
<scope>runtime</scope>
<exclusions>
<exclusion>

[/cc]

A simple one line change that should be easy enough to merge, right?
Unfortunately when trying to cherry-pick the changeset, the merge fails:

[cc lang=”bash” escaped=”true” line_numbers=”0″]
$> git cherry-pick -x f795d7b
warning: too many files (created: 864 deleted: 8584), skipping inexact rename detection
error: could not apply f795d7b… CONF-23398: Bumped jira-connector from 1.2-beta3 up to 1.2-beta4
hint: after resolving the conflicts, mark the corrected paths
hint: with ‘git add ‘ or ‘git rm ‘
hint: and commit the result with ‘git commit -c f795d7b’

$> git status
# On branch master
# Your branch is behind ‘origin/master’ by 16 commits, and can be fast-forwarded.
#
# Unmerged paths:
# (use “git reset HEAD …” to unstage)
# (use “git add/rm …” as appropriate to mark resolution)
#
# deleted by us: confluence-bundled-plugins-library/pom.xml
#
no changes added to commit (use “git add” and/or “git commit -a”)
[/cc]

The pom.xml this changeset tries to update doesn’t exist anymore as it was renamed so the changeset does not apply cleanly.
Git can do better than that though and it give us a hint by printing a warning: warning: too many files (created: 864 deleted: 8584), skipping inexact rename detection
As already mentioned, git tries to detect file renames after that fact, for example when using git log or git diff/merge.
When trying to detect renames git distinguishes between exact and inexact renames with the former being a rename without changing the content of the file and the latter a rename that might include changes to the content of the file (e.g. renaming/moving a Java Class). This distinction is important because the algorithm for detecting exact renames is linear and will always be executed while the algorithm for inexact rename detection is quadratic ( O(n^2) ) and git does not attempt to do this if the number of files changed exceeds a certain threshold (1000 by default). As the number of files affected by the recent reorganisation exceeds this threshold, git simply gives up and leaves the merge resolution up to the developer.

In our case we can avoid doing manual merge resolution though by changing the threshold and executing the cherry-pick command again:

[cc lang=”bash” escaped=”true” line_numbers=”0″]
# Start over
$> git reset –hard
HEAD is now at 79c0094 CONFDEV-5533 Remove usages of the old user and group icons

# Set renamelimit high enough to make it work with the Confluence source
$> git config merge.renameLimit 10000

# Cherry pick again
$> git cherry-pick -x f795d7b
[master dc290bc] CONF-23398: Bumped jira-connector from 1.2-beta3 up to 1.2-beta4 (cherry picked from commit f795d7b3fdda5f389baf66af9793b029af30b07a)
1 files changed, 1 insertions(+), 1 deletions(-)

# Now the change applies to the renamed file
$> git show
commit dc290bc2f253381aeed2ce3d5e3b2bbae4339ef6
Author: Stefan Saasen <devnull@atlassian.com>
Date: Thu Oct 6 11:12:37 2011 +1100
CONF-23398: Bumped jira-connector from 1.2-beta3 up to 1.2-beta4
(cherry picked from commit f795d7b3fdda5f389baf66af9793b029af30b07a)
diff –git a/confluence-build/confluence-bundled-plugins-library/pom.xml b/confluence-build/confluence-bundled-plugins-library/pom.xml
index 79f8971..4590baf 100644
— a/confluence-build/confluence-bundled-plugins-library/pom.xml
+++ b/confluence-build/confluence-bundled-plugins-library/pom.xml
@@ -320,7 +320,7 @@
<dependency>
<groupId>com.atlassian.confluence.plugins.jira</groupId>
<artifactId>jira-connector</artifactId>
– <version>1.2-beta3</version>
+ <version>1.2-beta4</version>
<scope>runtime</scope>
<exclusions>
<exclusion>

[/cc]

With this setting applied, the patch now applies cleanly and updates the_renamed_ file. Nice!


When not explicitly set, merge.renameLimit defaults to 1000 files or uses the value for diff.renameLimit if set.
The diff.renameLimit affects git diff, git show and git log while merge.renameLimit applies to merge attempts (git merge, git cherry-pick) only. It’s a good idea to change the merge.renameLimit as opposed to changing the diff.renameLimit so that git does not attempt to find renames during common operations like looking at the git diff output.


Detecting renamed files: git diff/log -M or the diff.renames setting

There are more settings that affect the rename detection in git.
Consider the following changeset:

[cc lang=”bash” escaped=”true” line_numbers=”0″]
$> git show –stat 0cf0db778d885e8b0bcedfcd8ff243dedb4581bb
commit 0cf0db778d885e8b0bcedfcd8ff243dedb4581bb
Author: Daniel Kjellin
Date: Thu Oct 6 12:30:30 2011 +1100
CONFDEV-5127
Codereview improvements
…/confluence/core/persistence/SearchableDao.java | 3 +-
…/hibernate/HibernateSearchableDao.java | 11 +-
…/search/lucene/MultiThreadedIndexRebuilder.java | 124 ++++++++——
…/search/lucene/reindex/ReindexWorkBatch.java | 181 ++++++++++++++++++++
…/search/lucene/reindex/WorkBatch.java | 175 ——————-
5 files changed, 262 insertions(+), 232 deletions(-)

[/cc]

With the default setting, git doesn’t show what one would consider to be a rename (WorkBatch -&gt; ReindexWorkBatch) with a few changes in this changeset. By default, git does not attempt to do rename detection even if the diff.renameLimit is sufficient.Instead to show renames, commands like git show or git log can be used with the -M[] option that turns rename detection on. Git uses a similarity index and considers a delete/add pair to be a rename if more than x % of the file hasn’t changed. By default the threshold is 50%.

[cc lang=”bash” escaped=”true” line_numbers=”0″]
$> git show -M –stat 0cf0db778d885e8b0bcedfcd8ff243dedb4581bb
commit 0cf0db778d885e8b0bcedfcd8ff243dedb4581bb
Author: Daniel Kjellin
Date: Thu Oct 6 12:30:30 2011 +1100
CONFDEV-5127
Codereview improvements
…/confluence/core/persistence/SearchableDao.java | 3 +-
…/hibernate/HibernateSearchableDao.java | 11 +-
…/search/lucene/MultiThreadedIndexRebuilder.java | 124 ++++++++++++——–
…/{WorkBatch.java => ReindexWorkBatch.java} | 66 ++++++—–
4 files changed, 117 insertions(+), 87 deletions(-)
[/cc]


Set git config diff.renames true to always do rename detection when using git diff/show/log.


History beyond renames

There is one more thing though. If you now attempt to look at the history of the renamed file you might be disappointed to see that git log only seems to show the history since the rename happened.

[cc lang=”bash” escaped=”true” line_numbers=”0″]
$> git log — confluence-core/confluence/src/java/com/atlassian/confluence/search/lucene/reindex/ReindexWorkBatch.java
commit 0cf0db778d885e8b0bcedfcd8ff243dedb4581bb
Author: Daniel Kjellin
Date: Thu Oct 6 12:30:30 2011 +1100
CONFDEV-5127
Codereview improvements
$>
[/cc]

In this case the option --follow can be used to list the history of a single file beyond renames.

[cc lang=”bash” escaped=”true” line_numbers=”0″]
$> git log –oneline –graph –follow — ./confluence-core/confluence/src/java/com/atlassian/confluence/search/lucene/reindex/ReindexWorkBatch.java

* | 0cf0db7 CONFDEV-5127 Codereview improvements
|/

| * cb9f7ba CONFDEV-6072 – move test and core files

* eb7f971 CONFDEV-5127 Restructured re-indexing to be faster and report errors better
[/cc]

Git blame

Regardless of any of the settings mentioned so far, git blame will always show the filename in the original commit if there are any renames:

[cc lang=”bash” escaped=”true” line_numbers=”0″]
$> git blame — confluence-core/confluence/src/java/com/atlassian/confluence/search/lucene/reindex/ReindexWorkBatch.java
eb7f9719 confluence/src/java/com/atlassian/confluence/search/lucene/reindex/WorkBatch.java (Daniel Kjellin 2011-09-22 15:27:12 +1000 21) import java.util.LinkedList;
eb7f9719 confluence/src/java/com/atlassian/confluence/search/lucene/reindex/WorkBatch.java (Daniel Kjellin 2011-09-22 15:27:12 +1000 22) import java.util.List;
eb7f9719 confluence/src/java/com/atlassian/confluence/search/lucene/reindex/WorkBatch.java (Daniel Kjellin 2011-09-22 15:27:12 +1000 23) import java.util.ListIterator;
0cf0db77 confluence-core/confluence/src/java/com/atlassian/confluence/search/lucene/reindex/ReindexWorkBatch.java (Daniel Kjellin 2011-10-06 12:30:30 +1100 24) import java.util.concurrent.ConcurrentLinkedQueue;
eb7f9719 confluence/src/java/com/atlassian/confluence/search/lucene/reindex/WorkBatch.java (Daniel Kjellin 2011-09-22 15:27:12 +1000 25) import java.util.concurrent.atomic.AtomicInteger;
eb7f9719 confluence/src/java/com/atlassian/confluence/search/lucene/reindex/WorkBatch.java (Daniel Kjellin 2011-09-22 15:27:12 +1000 26)
0cf0db77 confluence-core/confluence/src/java/com/atlassian/confluence/search/lucene/reindex/ReindexWorkBatch.java (Daniel Kjellin 2011-10-06 12:30:30 +1100 27) public class ReindexWorkBatch implements Runnable
eb7f9719 confluence/src/java/com/atlassian/confluence/search/lucene/reindex/WorkBatch.java (Daniel Kjellin 2011-09-22 15:27:12 +1000 28) {
eb7f9719 confluence/src/java/com/atlassian/confluence/search/lucene/reindex/WorkBatch.java (Daniel Kjellin 2011-09-22 15:27:12 +1000 29) private final IndexTaskFactory indexTaskFactory;
[/cc]

Confluence, git, rename, merge oh my…