Resolving Application Dependencies with Git Submodules
Last updated March 09, 2022
Most modern applications rely heavily on third party libraries and must specify these dependencies within the application repository itself. Tools like RubyGems, Maven in Java, or Python’s pip are all dependency managers that translate a list of stated application dependencies into the code or binaries the application uses during execution.
However, in some cases the required third-party libraries can’t be resolved by the dependency manager. Such scenarios include private libraries that aren’t publicly accessible or libraries whose maintainers haven’t packaged them for distribution via the dependency manager. In these situations you can use git submodules to manually manage external dependencies.
This guide discusses the pros/cons of dependency management with git submodules as well as some alternative approaches to consider to avoid the use of submodules.
Git submodules
Git submodules are a feature of the Git SCM that allow you to include the contents of one repository within another by simply specifying the referenced repository location. This provides a mechanism of including an external library’s source into an application’s source tree.
For example, to include the FooBar
source into the heroku-rails
project, use the git submodule add
command.
$ cd ~/Code/heroku-rails
$ git submodule add https://github.com/myusername/FooBar lib/FooBar
Cloning into 'lib/FooBar'...
remote: Counting objects: 26, done.
remote: Compressing objects: 100% (17/17), done.
remote: Total 26 (delta 8), reused 19 (delta 5)
Unpacking objects: 100% (26/26), done.
This would create a new submodule called FooBar
and place a FooBar
directory with the full source tree of the library into the lib
application directory.
Once a git submodule is added locally you need to commit the new submodule reference to your application repository.
$ git commit -am "adding a submodule for FooBar"
[master 314ef62] adding a submodule for FooBar
2 files changed, 4 insertions(+)
create mode 160000 FooBar
Heroku properly resolves and fetches submodules as part of deployment:
$ git push heroku
Counting objects: 13, done.
...
-----> Heroku receiving push
-----> Git submodules detected, installing Submodule 'FooBar' (https://github.com/myusername/FooBar.git) registered for path 'FooBar'
Initialized empty Git repository in /tmp/build_2qfce3fkvrug9/FooBar/.git/
Submodule path 'FooBar': checked out '667e0b5717631a8cca657a0aa306c045f06cfda4'
-----> Ruby/Rails app detected
...
Note that failures to fetch the submodules will cause the build to fail.
If it’s at all possible to use your language’s preferred dependency resolution mechanisms, you should prefer it to using submodules, which can often be confusing and error-prone.
Using submodules for builds on Heroku is only supported for builds triggered with git pushes. Builds created with the API do not resolve submodules. This is also the case for GitHub sync.
Protected Git submodules
If the referenced git repository is protected via a username and password it’s still possible to reference it with a submodule. Since remote environments like Heroku don’t have access to locally available credentials you will need to embed the username and password into the repository URL.
For instance, to add the FooBar
submodule using an HTTP basic authentication URL scheme (note the presence of the username:password
tokens):
$ git submodule add https://username:password@github.com/myusername/FooBar
This adds a private submodule dependency to the application while still allowing it to resolve in non-local environments.
Since submodule references are stored in plaintext in the .git/submodules
directory please consider if this is acceptable for your particular security requirements.
Vendoring
While Git submodules are one way to quickly reference external library source, users often run into issues with its nuanced update lifecycle. If you find the usability of submodules to be counter-productive you can vendor the code into the project.
Many frameworks allow the use of “vendored” code which simply copies the source of the reference library into the application’s source tree:
$ git clone <remote repo> /path/to/some/directory
$ cp -R /path/to/some/directory /app/vendor/directory
$ git add app/vendor/directory
A downside of this approach is that it requires a manual download and copy process when the external library is updated. However, for a external resource that changes very slowly, or that you don’t want to introduce changes from, this is an option.
Private dependency repositories
A very robust and scalable approach to dependency management is to use a private package repository. For Ruby, Python and Node.js this is available on Heroku with the Gemfury add-on. For JVM-based languages it’s easy to use a private S3 bucket with the s3-wagon-private tool. It may also be possible to host your dependencies on Heroku using custom buildpack functionality.
Private package repositories allow you to use your language’s dependency management tools while limiting access to only your application or organization. While this does incur the overhead of having to properly package your referenced libraries for broader distribution it is a much more scalable approach that takes advantage of your language’s well supported and vetted dependency toolset.