Introduction to Git – Round 2 (Advanced)

Git is an incredibly powerful, flexible, and capable distributed version control system. Unfortunately, it can also be off-putting and a bit terse in its documentation, something people have remarked about on a number of occasions.

You may have already read the series Introduction to Git by Sean Hudgston. If not, I really suggest that you do. Not only is it a great introduction, as the name implies, but it’s also a good primer for this article.

Here I’ll discuss some advanced topics that you may or may not come across as a part of your normal development workflow – features that allow you to do more than other systems, such as SVN. These include:

  • Exporting your repository
  • Basic Rebasing
  • Reordering a set of commits
  • Splitting a set of commits
  • Changing the commit message
  • Merging a set of commits

If you’ve spent your time exclusively with version control systems other than Git and worked by their philosophies, some of these concepts might be alien (or even taboo) to you. But don’t fear; I’ll explain how to perform such actions and show you times when they might be beneficial to use.

Exporting Your Repository

Let’s start out nice and easy. Exporting may not be that advanced, but it’s a really handy feature. Exporting allows us to deliver a release or snapshot of our code to others without all of the version control information we normally keep.

Archives from Local Repos

Archiving can be a handy part of our deployment processes, with tools such as Phing and Capistrano, or continuous integration tools such as Hudson and Xinc. (Optionally, we could trigger this in response to a commit hook but perhaps that’s a bit excessive.)

The options for creating an archive from a local repository are pretty simple. Here’s an example:

$ git archive --format=tar --prefix=sitepoint-git-advanced/ 
    HEAD > sitepoint-git-advanced.tar | 
    gzip > sitepoint-git-advanced.tar.gz

In the example above, I want the format of the archive to be TAR (you can also specify ZIP if you prefer). I’ve then specified a prefix value of “sitepoint-git-advanced”. If the trailing slash is not appended, then the name is given to all files in the archive. In this case, it’s only given to the entire archive.

Now let’s look at another example, one in which the output is a ZIP file. Instead of specifying the format and then piping the output to gzip, we can use the --output parameter and provide a name with .zip as the file extension.

$ git archive --output=sitepoint-git-advanced.zip  
    --prefix=sitepoint-git-advanced/ -9 HEAD

git archive exports the repository to a ZIP archive for us automatically instead of sending the result to stdout. When using ZIP, you can also specify the compression level using switches -0 to -9, which is handy if the resulting file would be rather large.

Archives from Remote Repos

Local repos aren’t the only type of repositories we can archive, though. With the --remote switch, we are able to archive repositories from Gitosis and other Git servers our team may be using to store collective code.

The example below attempts to clone the Joind.In project. Unfortunately, to be honest, it doesn’t work (from searching forums I believe that it’s been deliberately disabled by Github to help them ensure a better user experience), but I’ve included it here to show you an example of the command.

$ git archive --remote=git://github.com/settermjd/joind.in.git 
    --format=tar --prefix=joind.in/ HEAD | 
    gzip > joindin.tar.gz

Basic Rebasing

Now let’s get in to some of the more juicy aspects of working with Git. According to the Git SCM book, with the rebase command we can “take all the changes that were committed on one branch and replay them on another one.” In short, this allows us to merge branches and changes in a powerful and flexible way. It’s also here that a bit of a debate begins. If you’re familiar with other version control tools such as CVS, SVN, and Mercurial, then you may be used to a model where once a commit is made, that’s it – it can’t be changed. With Git, this isn’t necessarily the case.

Reordering Commits

Suppose the order of our commits wasn’t quite what it should have been. We’ve made a set of commits, but it would be more practical going in a sequential order. Perhaps they all relate to the same code module and there’s a change for an unrelated module in the middle. It’s not a show stopper, but it might be helpful to have them together when reviewing the history months later.

$ git rebase -i HEAD~5

Using the command above, I’ve started a rebase session covering the last five commits, displayed by git rebase in reverse order:

pick da4fc8e changing default to add iteration
pick 003ab61 adding even iteration script
pick 701e132 reworking directory structure
pick c722b9f removing old files
pick 3a10e1c Added in files that were forgotten in the previous commit

Suppose commit 5 (3a10e1c) should be after commit 2 (003ab61). We can change the commit order in the editor as follows:

pick da4fc8e changing default to add iteration
pick 003ab61 adding even iteration script
pick 3a10e1c Added in files that were forgotten in the previous commit
pick 701e132 reworking directory structure
pick c722b9f removing old files

Save the buffer and exit the editor and we receive the following:

Successfully rebased and updated refs/heads/testing.

Running git log now produces the following output:

commit ab97ffb2fe87c25ca48116ed03cdbe486dc0abb5
Author: Matthew Setter 
Date:   Fri Jan 18 12:54:00 2012 +0000

    removing old files

commit 3b5d6c5b0c6a866f426ff5bf412508ce9d9ae864
Author: Matthew Setter 
Date:   Fri Jan 18 12:53:30 2012 +0000

    reworking directory structure

commit d9ab87677f8e5692b649b650a6de6ce836c72531
Author: Matthew Setter 
Date:   Mon Jan 21 10:11:33 2012 +0100

    Added in files that were forgotten in the previous commit

commit 003ab6124b7f7a59be4fe152df28b9de706383ed
Author: Matthew Setter 
Date:   Fri Jan 18 12:44:08 2012 +0000

    adding even iteration script

We can see by the new commit order that the re-order was successful. Git rewound back to the first commit specified, then replayed the history with the changes we’ve just specified.

Changing the Commit Message

What if you’re used to writing commit messages in a particular way that, for example, you learned from a previous job, but now you’re at a new organization and they write them differently? You’ve just committed a change and the senior developer, upon reviewing it, sees the message and wants it changed to comply with the company standard. With Git, you can go back and change the message.

First, we have to start the rebase process by issuing the following command:

$ git rebase -i HEAD~2

We’re requesting to rebase around two revisions prior to HEAD. The terminal displays the following in the editing buffer:

pick c722b9f removing old files
pick bcc1dff adding some files

# Rebase 701e132..bcc1dff onto 701e132
#
# Commands:
#
#  p, pick = use commit
#  r, reword = use commit, but edit the commit message
#  e, edit = use commit, but stop for ammending
#  s, squash = use commit, but meld into previous commit
#  f, fixup = like "squash" but discard the commit's log message
#  x, exec = run command (the rest of the line) using shell
#
# If you remove a line here THAT COMMIT WILL BE LOST.
# However, if you remove everything, the rebase will be aborted.
#

We see the two commits at the top. In the list of commands below that, we have the option for reword which allows us to change the commit message.

Here I’ve changed pick to reword:

pick c722b9f removing old files
reword bcc1dff adding some files

# Rebase 701e132..bcc1dff onto 701e132
#
# Commands:
#
...

Saving and exiting the buffer will then allow us to re-specify the commit message.

Merging a Set of Commits

You’re probably pretty diligent in your commit practices, ensuring commits only include a single change or a group of related changes for one purpose. But there’s always something that can upset our normal rhythm.

Let’s say we made a commit for a change, did some work and made another commit, and then did some further work and committed that. After the third commit, we discover that commits 1 and 3 really were related to the same update and should go together.

If we were working with SVN or CVS, that would be it; the revision history would be fixed. With Git, we can open up the history and merge commits 1 and 3 into a single commit prior to commit 2. Let’s go through the process.

We start off the rebase process looking at the last five commits which look as follows:

$ git rebase -i HEAD~5

pick da4fc8e changing default to add iteration
pick 003ab61 adding even iteration script
pick d9ab876 Added in files that were forgotten in the previous commit
pick 3b5d6c5 reworking directory structure
pick ab97ffb removing old files

We then change the order so that 3 comes before 2 as follows:

pick da4fc8e changing default to add iteration
pick d9ab876 Added in files that were forgotten in the previous commit
pick 003ab61 adding even iteration script
pick 3b5d6c5 reworking directory structure
pick ab97ffb removing old files

We then change pick to squash on the new commit number 2:

pick da4fc8e changing default to add iteration
squash d9ab876 Added in files that were forgotten in the previous commit
pick 003ab61 adding even iteration script
pick 3b5d6c5 reworking directory structure
pick ab97ffb removing old files

We’re then put into editor mode again, showing us the previous commit messages. We add in a new commit message that summarizes the two commits before they’re merged together. So, I change the following:

# This is a combination of 2 commits.
# The first commit's message is: 

changing default to add iteration

# This is the 2nd commit message:

Added in files that were forgotten in the previous commit

With a new message:

# This is a combination of 2 commits.
# The first commit's message is: 

Merging two changes together as they should be together.

After exiting the editor, the rebase process completes.

By starting, non-destructively, the rebase process again, we see that the two commits have been merged. We now have five commits, with the first being the one before the first one previously:

pick f5cac8d adding top documentation
pick e50970f changing default to add iteration
pick 3c2b3a5 adding even iteration script
pick 3b136c1 reworking directory structure
pick 01ad46f removing old files

Splitting a Set of Commits

But what if we want to go in the opposite direction? What there’s a developer who’s young and maybe a bit careless, or what if you, yourself, have been a bit undisciplined? I know I suffer from this from time to time.

Let’s say that we have a commit with a number of changes that we’ve just thrown in because it’s late or we’re in a hurry. Now we’re fresh and want to fix up the mess. How do we do that?

$ git commit -m "need to finish quickly"
[testing e6a98bb] need to finish quickly
 0 files changed
 create mode 100644 about.php
 create mode 100644 contact-us.php
 create mode 100644 default.php
 create mode 100644 login.php

In the example above, I’ve committed a set of files, three business module pages (about.php, contact-us.php and default.php) and a user module page (login.php). Let’s separate them and do things properly.

Starting rebase with the last 3 commits, we see the following:

pick 3b136c1 reworking directory structure
pick 01ad46f removing old files
pick e6a98bb need to finish quickly

Change pick to edit on the last commit (e6a98bb) and exit the editor to receive the following output:

Stopped at e6a98bb... need to finish quickly
You can amend the commit now, with

    git commit --amend

Once you are satisfied with your changes, run

    git rebase --continue

with git reset HEAD^ we rewind the commit, unstaging the files. Now we are able to commit them properly, adding the business and user pages in two separate commits.

$ git add about.php contact-us.php default.php
$ git commit -m "Adding new business pages"
[detached HEAD a189c9b] Adding new business pages
 0 files changed
 create mode 100644 about.php
 create mode 100644 contact-us.php
 create mode 100644 default.php

$ git add login.php
$ git commit -m "adding new user login page"
[detached HEAD 9efcab9] adding new user login page
 0 files changed
 create mode 100644 login.php

Then, finish up by running git rebase with the --continue switch:

$ git rebase --continue
Successfully rebased and updated refs/heads/testing.

A Word of Warning

Now to quote the oft quoted phrase:

With Great Power comes Great Responsibility

Please remember that just because you can do something, doesn’t mean that you should. Especially if you’re collaborating in a team environment, if you make these changes in your repository and then merge with the master repository, you’re likely to, amongst other things, cause confusion with your team. It’s fine if it’s there valid reason and it doesn’t happen too often. But if it happens repeatedly, I’m sure you won’t be all that popular for too long.

Also, you need to appreciate the potential conflicts that these types of changes may introduce into the code as well. While these features can be fun, interesting, and even exciting, be aware of the ramifications of what you’re doing.

Conclusion

I hope that after reading this article you see just how much power and flexibility Git provides, but yet that you also appreciate that with that power must also come an understanding of when and when not to apply it.

Happy Git-time, and be share your thoughts and ideas in the comments so we can all benefit. For more information, here’s some links for further reading:

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • http://7fff.com john

    Not sure about your tar/gzip example. You may need a tee in there…

    git archive –format=tar –prefix=something/ HEAD | tee something.tar | gzip > something.tar.gz

    • http://www.maltblue.com Matthew Setter

      Hey there John,
      thanks for the feedback. I’ll check the code over shortly and submit a correction for the article if needs be. Thanks for picking me up on this.
      Matt