Git team workflows: merge or rebase?
The question is simple: In a software team using
git and feature branching, what‘s the best way to incorporate finished work back to your main line of development? It’s one of those recurring debates where both sides have strong opinions, and mindful conversation can sometimes be hard (for other examples of heated debate see: The Internet).
Should you adopt a rebase policy where the repository history is kept flat and clean? Or a merge policy, which gives you traceability at the expense of readability and clarity (going so far as forbidding fast-forward merges)?
The topic is a bit controversial; maybe not as much as classic holy wars between vim and Emacs, or between Linux and BSD, but the two camps are vocal.
My empirical pulse on
all-things-git – scientific, I know! – is that the always merge approach has a slightly bigger mind share. But the always rebase field is also pretty vocal online. For examples see:
- A rebase-based workflow
- A Rebase Workflow for Git
- A Simple Git Rebase Workflow, Explained
- A Git Workflow for Agile Teams
To be honest, the split in two camps – always rebase vs. always merge – can be confusing, because rebase as local cleanup is a different thing than rebase as team policy.
Rebase as team policy is a different thing than rebase as cleanup. Rebase as cleanup is a healthy part of the coding lifecycle of the
git practitioner. Let me detail some example scenarios that show when rebasing is reasonable and effective (and when it's not):
- You're developing locally. You have not shared your work with anyone else. At this point, you should prefer rebasing over merging to keep history tidy. If you‘ve got your personal fork of the repository and that is not shared with other developers, you’re safe to rebase even after you've pushed to your fork.
- Your code is ready for review. You create a pull request, others are reviewing your work and are potentially fetching it into their fork for local review. At this point you should not rebase your work. You should create ‘rework’ commits and update your feature branch. This helps with traceability in the pull request, and prevents the accidental history breakage.
- Review is done and ready to be integrated into the target branch. Congratulations! You‘re about to delete your feature branch. Given that other developers won’t be fetch-merging in these changes from this point on, this is your chance to sanitize history. At this point you can rewrite history and fold the original commits and those pesky ‘pr rework’ and ‘merge’ commits into a small set of focussed commits. Creating an explicit merge for these commits is optional, but has value. It records when the feature graduated to master.
With this aside clear we can now talk about policies. I'll try to keep a balanced view on the argument, and will mention how the problem is dealt with inside Atlassian.
It‘s obviously hard to generalize since every team is different, but we have to start from somewhere. Consider this policy as a possible example: When a feature branch’s development is complete, rebase/squash all the work down to the minimum number of meaningful commits and avoid creating a merge commit – either making sure the changes fast-forward (or simply
cherry-pick those commits into the target branch).
While the work is still in progress and a feature branch needs to be brought up to date with the upstream target branch, use
rebase – as opposed to
merge – not to pollute the history with spurious merges.
- Code history remains flat and readable. Clean, clear commit messages are as much part of the documentation of your code base as code comments, comments on your issue tracker etc. For this reason, it's important not to pollute history with 31 single-line commits that partially cancel each other out for a single feature or bug fix. Going back through history to figure out when a bug or feature was introduced, and why it was done, is going to be tough in a situation like this.
- Manipulating a single commit is easy (e.g. reverting them).
- Squashing the feature down to a handful of commits can hide context, unless you keep around the historical branch with the entire development history.
- Rebasing doesn't play well with pull requests, because you can't see what minor changes someone made if they rebased (incidentally, the consensus inside the Stash development team is to never rebase during a pull request).
- Rebasing can be dangerous! Rewriting history of shared branches is prone to team work breakage. This can be mitigated by doing the rebase/squash on a copy of the feature branch, but rebase carries the implication that competence and carefulness must be employed.
- It's more work: Using rebase to keep your feature branch updated requires that you resolve similar conflicts again and again. Yes, you can reuse recorded resolutions (rerere) sometimes, but
mergeswin here: Just solve the conflicts one time, and you're set.
- Another side effect of rebasing with remote branches is that you need to force push at some point. The biggest problem we‘ve seen at Atlassian is that people force push – which is fine – but haven’t set
git push.default. This results in updates to all branches having the same name, both locally and remotely, and that is dreadful to deal with.
NOTE: When history is rewritten in a shared branch touched by multiple developers breakage happens.
Always Merge-based policies instead flow like this: When a feature branch is complete merge it to your target branch (
Make sure the merge is explicit with
--no-ff, which forces
git to record a merge commit in all cases, even if the changes could be replayed automatically on top of the target branch.
- Traceability: This helps keeping information about the historical existence of a feature branch and groups together all commits part of the feature.
- History can become intensely polluted by lots of merge commits, and visual charts of your repository can have rainbow branch lines that don‘t add too much information, if not outright obfuscate what’s happening. (Now to be fair, confusion is easily solved by knowing how to navigate your history; The trick here is to use, for example,
git log --first-parentto make sense of what happened.)
- Debugging using
git bisectcan become much harder due to the merge commits.
So what's best? What do the experts recommend?
If you and your team are not familiar with, or don't understand the intricacies of
rebase, then you probably shouldn't use it. In this context, always merge is the safest option.
If you and your team are familiar with both options, then the main decision revolves around this: Do you value more a clean, linear history? Or the traceability of your branches? In the first case go for a rebase policy, in the later go for a merge one.
Note that a rebase policy comes with small contraindications and takes more effort.
The policy inside Atlassian's Stash team is always to merge feature branches, and require that branches are merged through a pull request for quality and code review. But the team is not too strict around fast-forward.