Versioning Large Files with Git LFS
Versioning large files can be problematic with distributed version control systems like Git. Git Large File Storage (or LFS) is a new, open-source extension to Git that aims to improve handling of large files.
It does this by replacing large files in your repository—such as graphics and videos—with simple text pointers. These pointers reference the large files, which are hosted elsewhere, either by GitHub or another external source such as an AWS bucket.
Initially this is confusing, but hopefully LFS will make more sense by the end of this tutorial.
Availability
LFS will be available to all GitHub users soon, but as I'm handling a lot of large files through GitHub with my Chip Shop board game, I signed up to the early access program to find out more.
Note: Unless you have Git LFS enabled on your account, this tutorial won't work for you yet, but you can get an idea of the future feature.
Installing
If you have a Linux- or Windows-based system, visit git-lfs.github.com, download the installer, unarchive it and run the installer script.
If you have a Mac, do the same, but Git LFS is also available via Homebrew: install with brew install git-lfs
.
Getting Started
Note: Git LFS will currently only work when using Git on the command line. I normally use Tower for managing my Git workflow, and this broke any LFS-related actions.
Create a Git repository as you normally do, and initialize the files you wish to track with LFS by issuing commands such as:
git lfs track "*.psd"
git lfs track "*.mp3"
Then use git as normal:
git add *.psd
git commit -m "Added PhotoShop files"
git push origin master
So far, not a lot is different, but if we look into the details of a file on GitHub, we can see a subtle difference. Here’s a file hosted on GitHub:
And here’s a file hosted externally via LFS:
In the first image (a traditional GitHub repository), the file is located in the repository. In the second image (an LFS-enabled repository) the file is located in an AWS bucket.
Now go ahead and create some branches, make file changes, commit them and push:
Hang on, that .git
folder is still large: wasn't LFS supposed to handle files better?
This is where LFS gets somewhat confusing and possibly not as useful as you may have hoped.
To see what's happening more clearly, delete the repo folder you created and then re-clone it from GitHub. You should see something like the below, with an appropriately sized .git
folder:
Those files are ridiculously small! Try opening one, and you'll likely see this message:
This is one of the aforementioned file references, and if you feel like cracking it open in a text editor you'll see something like this:
version https://git-lfs.github.com/spec/v1
oid sha256:128b446a2cd06dd3b4dc2e2fe3336426792425870c3ada44ae7684b8391dc04d
size 1036867
This is great if you're a developer on a team, as you probably don't need lots of media files cluttering your computer. And when it comes to deployment or testing, you likely have a build process that will assemble a project with real media. But what if you're a designer and need to make changes? How do you access the real file?
Let's see what commands are available to LFS by typing git lfs
:
There are several that may be worth investigating in the future, but of most interest to us right now is fetch
. By default, this will retrieve local copies of all files in the current checked out branch. This can be made specific by supplementing it with branches or commits:
# Fetches All
git lfs fetch
# Fetches a particular branch
git checkout -b new_image
git lfs fetch new_image
git lfs fetch 5a5c0ef0de779c9d4585320eab8d4a1bec696005
And the files are available locally:
I would love a specific command to then remove the local file and replace it with a reference. Maybe there is another way of accomplishing this, or it will come in a future version.
Conclusion
Git LFS is a promising start, and I can see glimpses of genuine usefulness in the future. It needs better documentation and proper integration with 3rd party tools (the GitHub website included), which I'm sure are coming. If you have a decent Git, continuous integration and deployment system in place, then LFS will make far more sense. If you're a small team, then it may be more of a bottleneck.
What are your thoughts? Useful or unnecessary?