Alongside these conceptual challenges are numerous performance issues that can affect a monorepo setup.
Managing unrelated projects in a single repository at scale can prove troublesome at the commit level. Over time this can lead to a large number of commits with a significant rate of growth (Facebook cites “thousands of commits a week”). This becomes especially troublesome as Git uses a directed acyclic graph (DAG) to represent the history of a project. With a large number of commits any command that walks the graph could become slow as the history deepens.
Some examples of this include investigating a repository's history via git log or annotating changes on a file by using git blame. With git blame if your repository has a large number of commits, Git would have to walk a lot of unrelated commits in order to calculate the blame information. Other examples would be answering any kind of reachability question (e.g. is commit A reachable from commit B). Add together many unrelated modules found in a monorepo and the performance issues compound.
A large number of refs (i.e branches or tags) in your monorepo affect performance in many ways.
Ref advertisements contain every ref in your monorepo. As ref advertisements are the first phase in any remote git operation, this affects operations like git clone,
git fetch or git push. With a large number of refs, performance takes a hit when performing these operations. You can see the ref advertisement by using
git ls-remote with a repository URL. For example,
git ls-remote git://git.kernel.org/ pub/scm/linux/kernel/git/torvalds/linux.git will list all the references in the Linux Kernel repository.
If refs are loosely stored listing branches would be slow. After a git gc refs are packed in a single file and even listing over 20,000 refs is fast (~0.06 seconds).
Any operation that needs to traverse a repository's commit history and consider each ref (e.g. git branch
--contains SHA1) will be slow in a monorepo. In a repository with 21708 refs, listing the refs that contain an old commit (that is reachable from almost all refs) took:
User time (seconds): 146.44*
*This will vary depending on page caches and the underlying storage layer.
The index or directory cache (
.git/index) tracks every file in your repository. Git uses this index to determine whether a file has changed by executing
stat(1) on every single file and comparing file modification information with the information contained in the index.
Thus the number of files tracked impacts the performance* of many operations:
git status could be slow (stats every single file, index file will be large)
git commit could be slow as well (also stats every single file)
*This will vary depending on page caches and the underlying storage layer, and is only noticeable when there are a large number of files, in the realm of tens or hundreds of thousands.
Large files in a single subtree/project affects the performance of the entire repository. For example, large media assets added to an iOS client project in a monorepo are cloned despite a developer (or build agent) working on an unrelated project.
Whether it's the number of files, how often they're changed or how large they are, these issues in combination have an increased impact on performance:
- Switching between branches/tags, which is most useful in a subtree context (e.g. the subtree I'm working on), still updates the entire tree. This process can be slow due to the number of files affected or requires a workaround. Using
git checkout ref-28642-31335 -- templates for example updates the
./templates directory to match the given branch but without updating
HEAD which has the side effect of marking the updated files as modified in the index.
- Cloning and fetching slows and is resource intensive on the server as all information is condensed in a packfile before transfer.
- Garbage collection is slow and by default triggered on a push (if garbage collection is necessary).
- Resource usage is high for every operation that involves the (re-)creation of a packfile, e.g.
git upload-pack, git gc.