Versioning Large Files with Git LFS

    Chris Ward
    Share

    Versioning large files can be problematic with distributed version control systems like Git. Git Large File Storage (or LFS) is a new, open-source extension to Git that aims to improve handling of large files.

    It does this by replacing large files in your repository—such as graphics and videos—with simple text pointers. These pointers reference the large files, which are hosted elsewhere, either by GitHub or another external source such as an AWS bucket.

    Initially this is confusing, but hopefully LFS will make more sense by the end of this tutorial.

    Availability

    LFS will be available to all GitHub users soon, but as I'm handling a lot of large files through GitHub with my Chip Shop board game, I signed up to the early access program to find out more.

    Note: Unless you have Git LFS enabled on your account, this tutorial won't work for you yet, but you can get an idea of the future feature.

    Installing

    If you have a Linux- or Windows-based system, visit git-lfs.github.com, download the installer, unarchive it and run the installer script.

    If you have a Mac, do the same, but Git LFS is also available via Homebrew: install with brew install git-lfs.

    Getting Started

    Note: Git LFS will currently only work when using Git on the command line. I normally use Tower for managing my Git workflow, and this broke any LFS-related actions.

    Create a Git repository as you normally do, and initialize the files you wish to track with LFS by issuing commands such as:

    git lfs track "*.psd"
    git lfs track "*.mp3"

    Then use git as normal:

    git add *.psd
    git commit -m "Added PhotoShop files"
    git push origin master

    So far, not a lot is different, but if we look into the details of a file on GitHub, we can see a subtle difference. Here’s a file hosted on GitHub:

    Normal GitHub File location

    And here’s a file hosted externally via LFS:

    LFS GitHub File location

    In the first image (a traditional GitHub repository), the file is located in the repository. In the second image (an LFS-enabled repository) the file is located in an AWS bucket.

    Now go ahead and create some branches, make file changes, commit them and push:

    Git Folder Size

    Hang on, that .git folder is still large: wasn't LFS supposed to handle files better?

    This is where LFS gets somewhat confusing and possibly not as useful as you may have hoped.

    To see what's happening more clearly, delete the repo folder you created and then re-clone it from GitHub. You should see something like the below, with an appropriately sized .git folder:

    PSD file size

    New Git Folder Size

    Those files are ridiculously small! Try opening one, and you'll likely see this message:

    Photoshop Error Message

    This is one of the aforementioned file references, and if you feel like cracking it open in a text editor you'll see something like this:

    version https://git-lfs.github.com/spec/v1
    oid sha256:128b446a2cd06dd3b4dc2e2fe3336426792425870c3ada44ae7684b8391dc04d
    size 1036867

    This is great if you're a developer on a team, as you probably don't need lots of media files cluttering your computer. And when it comes to deployment or testing, you likely have a build process that will assemble a project with real media. But what if you're a designer and need to make changes? How do you access the real file?

    Let's see what commands are available to LFS by typing git lfs:

    LFS Commands

    There are several that may be worth investigating in the future, but of most interest to us right now is fetch. By default, this will retrieve local copies of all files in the current checked out branch. This can be made specific by supplementing it with branches or commits:

    # Fetches All
    git lfs fetch
    # Fetches a particular branch
    git checkout -b new_image
    git lfs fetch new_image
    git lfs fetch 5a5c0ef0de779c9d4585320eab8d4a1bec696005

    And the files are available locally:

    Restored Folder

    I would love a specific command to then remove the local file and replace it with a reference. Maybe there is another way of accomplishing this, or it will come in a future version.

    Conclusion

    Git LFS is a promising start, and I can see glimpses of genuine usefulness in the future. It needs better documentation and proper integration with 3rd party tools (the GitHub website included), which I'm sure are coming. If you have a decent Git, continuous integration and deployment system in place, then LFS will make far more sense. If you're a small team, then it may be more of a bottleneck.

    What are your thoughts? Useful or unnecessary?