Git submodules allow you to keep a git repository as a subdirectory of another git repository. Git submodules are simply a reference to another repository at a particular snapshot in time. Git submodules enable a Git repository to incorporate and track version history of external code.
What is a git submodule?
Often a code repository will depend upon external code. This external code can be incorporated in a few different ways. The external code can be directly copied and pasted into the main repository. This method has the downside of losing any upstream changes to the external repository. Another method of incorporating external code is through the use of a language's package management system like Ruby Gems or NPM. This method has the downside of requiring installation and version management at all places the origin code is deployed. Both of these suggested incorporation methods do not enable tracking edits and changes to the external repository.
A git submodule is a record within a host git repository that points to a specific commit in another external repository. Submodules are very static and only track specific commits. Submodules do not track git refs or branches and are not automatically updated when the host repository is updated. When adding a submodule to a repository a new .gitmodules file will be created. The .gitmodules file contains meta data about the mapping between the submodule project's URL and local directory. If the host repository has multiple submodules, the .gitmodules file will have an entry for each submodule.
When should you use a git submodule?
If you need to maintain a strict version management over your external dependencies, it can make sense to use git submodules. The following are a few best use cases for git submodules.
- When an external component or subproject is changing too fast or upcoming changes will break the API, you can lock the code to a specific commit for your own safety.
- When you have a component that isn’t updated very often and you want to track it as a vendor dependency.
- When you are delegating a piece of the project to a third party and you want to integrate their work at a specific time or release. Again this works when updates are not too frequent.
Common commands for git submodules
Add git submodule
git submodule add is used to add a new submodule to an existing repository. The following is an example that creates an empty repo and explores git submodules.
$ mkdir git-submodule-demo
$ cd git-submodule-demo/
$ git init
Initialized empty Git repository in /Users/atlassian/git-submodule-demo/.git/
This sequence of commands will create a new directory
git-submodule-demo, enter that directory, and initialize it as a new repository. Next we will add a submodule to this fresh new repo.
$ git submodule add https://bitbucket.org/jaredw/awesomelibrary
Cloning into '/Users/atlassian/git-submodule-demo/awesomelibrary'...
remote: Counting objects: 8, done.
remote: Compressing objects: 100% (6/6), done.
remote: Total 8 (delta 1), reused 0 (delta 0)
Unpacking objects: 100% (8/8), done.
git submodule add command takes a URL parameter that points to a git repository. Here we have added the
awesomelibrary as a submodule. Git will immediately clone the submodule. We can now review the current state of the repository using
$ git status
On branch main
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: .gitmodules
new file: awesomelibrary
There are now two new files in the repository
.gitmodules and the
awesomelibrary directory. Looking at the contents of
.gitmodules shows the new submodule mapping
path = awesomelibrary
url = https://bitbucket.org/jaredw/awesomelibrary
$ git add .gitmodules awesomelibrary/
$ git commit -m "added submodule"
[main (root-commit) d5002d0] added submodule
2 files changed, 4 insertions(+)
create mode 100644 .gitmodules
create mode 160000 awesomelibrary
Cloning git submodules
git clone /url/to/repo/with/submodules
git submodule init
git submodule update
Git submodule Init
The default behavior of
git submodule init is to copy the mapping from the
.gitmodules file into the local
./.git/config file. This may seem redundant and lead to questioning
git submodule init usefulness.
git submodule init has extend behavior in which it accepts a list of explicit module names. This enables a workflow of activating only specific submodules that are needed for work on the repository. This can be helpful if there are many submodules in a repo but they don't all need to be fetched for work you are doing.
Once submodules are properly initialized and updated within a parent repository they can be utilized exactly like stand-alone repositories. This means that submodules have their own branches and history. When making changes to a submodule it is important to publish submodule changes and then update the parent repositories reference to the submodule. Let’s continue with the
awesomelibrary example and make some changes:
$ cd awesomelibrary/
$ git checkout -b new_awesome
Switched to a new branch 'new_awesome'
$ echo "new awesome file" > new_awesome.txt
$ git status
On branch new_awesome
(use "git add <file>..." to include in what will be committed)
nothing added to commit but untracked files present (use "git add" to track)
$ git add new_awesome.txt
$ git commit -m "added new awesome textfile"
[new_awesome 0567ce8] added new awesome textfile
1 file changed, 1 insertion(+)
create mode 100644 new_awesome.txt
$ git branch
Here we have changed directory to the awesomelibrary submodule. We have created a new text file
new_awesome.txt with some content and we have added and committed this new file to the submodule. Now let us change directories back to the parent repository and review the current state of the parent repo.
$ cd ..
$ git status
On branch main
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: awesomelibrary (new commits)
no changes added to commit (use "git add" and/or "git commit -a")
git status shows us that the parent repository is aware of the new commits to the
awesomelibrary submodule. It doesn't go into detail about the specific updates because that is the submodule repositories responsibility. The parent repository is only concerned with pinning the submodule to a commit. Now we can update the parent repository again by doing a
git add and
git commit on the submodule. This will put everything into a good state with the local content. If you are working in a team environment it is critical that you then
git push the submodule updates, and the parent repository updates.
When working with submodules, a common pattern of confusion and error is forgetting to push updates for remote users. If we revisit the
awesomelibrary work we just did, we pushed only the updates to the parent repository. Another developer would go to pull the latest parent repository and it would be pointing at a commit of
awesomelibrary that they were unable to pull because we had forgotten to push the submodule. This would break the remote developers local repo. To avoid this failure scenario make sure to always commit and push the submodule and parent repository.
Git submodules are a powerful way to leverage git as an external dependency management tool. Weigh the pros and cons of git submodules before using them, as they are an advanced feature and may take a learning curve for team members to adopt.