Web
Article

Versioning Large Files with Git LFS

By Chris Ward

Versioning large files can be problematic with distributed version control systems like Git. Git Large File Storage (or LFS) is a new, open-source extension to Git that aims to improve handling of large files.

It does this by replacing large files in your repository—such as graphics and videos—with simple text pointers. These pointers reference the large files, which are hosted elsewhere, either by GitHub or another external source such as an AWS bucket.

Initially this is confusing, but hopefully LFS will make more sense by the end of this tutorial.

Availability

LFS will be available to all GitHub users soon, but as I'm handling a lot of large files through GitHub with my Chip Shop board game, I signed up to the early access program to find out more.

Note: Unless you have Git LFS enabled on your account, this tutorial won't work for you yet, but you can get an idea of the future feature.

Installing

If you have a Linux- or Windows-based system, visit git-lfs.github.com, download the installer, unarchive it and run the installer script.

If you have a Mac, do the same, but Git LFS is also available via Homebrew: install with brew install git-lfs.

Getting Started

Note: Git LFS will currently only work when using Git on the command line. I normally use Tower for managing my Git workflow, and this broke any LFS-related actions.

Create a Git repository as you normally do, and initialize the files you wish to track with LFS by issuing commands such as:

git lfs track "*.psd"
git lfs track "*.mp3"

Then use git as normal:

git add *.psd
git commit -m "Added PhotoShop files"
git push origin master

So far, not a lot is different, but if we look into the details of a file on GitHub, we can see a subtle difference. Here’s a file hosted on GitHub:

Normal GitHub File location

And here’s a file hosted externally via LFS:

LFS GitHub File location

In the first image (a traditional GitHub repository), the file is located in the repository. In the second image (an LFS-enabled repository) the file is located in an AWS bucket.

Now go ahead and create some branches, make file changes, commit them and push:

Git Folder Size

Hang on, that .git folder is still large: wasn't LFS supposed to handle files better?

This is where LFS gets somewhat confusing and possibly not as useful as you may have hoped.

To see what's happening more clearly, delete the repo folder you created and then re-clone it from GitHub. You should see something like the below, with an appropriately sized .git folder:

PSD file size

New Git Folder Size

Those files are ridiculously small! Try opening one, and you'll likely see this message:

Photoshop Error Message

This is one of the aforementioned file references, and if you feel like cracking it open in a text editor you'll see something like this:

version https://git-lfs.github.com/spec/v1
oid sha256:128b446a2cd06dd3b4dc2e2fe3336426792425870c3ada44ae7684b8391dc04d
size 1036867

This is great if you're a developer on a team, as you probably don't need lots of media files cluttering your computer. And when it comes to deployment or testing, you likely have a build process that will assemble a project with real media. But what if you're a designer and need to make changes? How do you access the real file?

Let's see what commands are available to LFS by typing git lfs:

LFS Commands

There are several that may be worth investigating in the future, but of most interest to us right now is fetch. By default, this will retrieve local copies of all files in the current checked out branch. This can be made specific by supplementing it with branches or commits:

# Fetches All
git lfs fetch
# Fetches a particular branch
git checkout -b new_image
git lfs fetch new_image
git lfs fetch 5a5c0ef0de779c9d4585320eab8d4a1bec696005

And the files are available locally:

Restored Folder

I would love a specific command to then remove the local file and replace it with a reference. Maybe there is another way of accomplishing this, or it will come in a future version.

Conclusion

Git LFS is a promising start, and I can see glimpses of genuine usefulness in the future. It needs better documentation and proper integration with 3rd party tools (the GitHub website included), which I'm sure are coming. If you have a decent Git, continuous integration and deployment system in place, then LFS will make far more sense. If you're a small team, then it may be more of a bottleneck.

What are your thoughts? Useful or unnecessary?

More:
  • http://www.dfbgaming.com/ {dFb}eMac

    This reminds me of git-annex or git-bigstore but it’s good to see this officially supported on Github. I’ve mainly used git-annex before but I still prefer using other version control systems that properly support large binaries. I’ll have to try this.

    • Chris Ward

      Let me know your experiences :)

  • Elliot Birch

    Nice Chris!!! I thought when I was reading this “Oh, it’s another Chris Ward, so many of them”. Didn’t expect it to be one I knew ;) Great stuff!

    • Chris Ward

      Hello :)

Recommended

Learn Coding Online
Learn Web Development

Start learning web development and design for free with SitePoint Premium!

Instant Website Review

Use Woorank to analyze and optimize your website to improve your website to improve your ranking!

Run a review to see how your site can improve across 70+ metrics!

Get the latest in Front-end, once a week, for free.