Git is an incredibly powerful, flexible, and capable distributed version control system. Unfortunately, it can also be off-putting and a bit terse in its documentation, something people have remarked about on a number of occasions.
You may have already read the series Introduction to Git by Sean Hudgston. If not, I really suggest that you do. Not only is it a great introduction, as the name implies, but it’s also a good primer for this article.
Here I’ll discuss some advanced topics that you may or may not come across as a part of your normal development workflow – features that allow you to do more than other systems, such as SVN. These include:
- Exporting your repository
- Basic Rebasing
- Reordering a set of commits
- Splitting a set of commits
- Changing the commit message
- Merging a set of commits
If you’ve spent your time exclusively with version control systems other than Git and worked by their philosophies, some of these concepts might be alien (or even taboo) to you. But don’t fear; I’ll explain how to perform such actions and show you times when they might be beneficial to use.
Exporting Your Repository
Let’s start out nice and easy. Exporting may not be that advanced, but it’s a really handy feature. Exporting allows us to deliver a release or snapshot of our code to others without all of the version control information we normally keep.
Archives from Local Repos
Archiving can be a handy part of our deployment processes, with tools such as Phing and Capistrano, or continuous integration tools such as Hudson and Xinc. (Optionally, we could trigger this in response to a commit hook but perhaps that’s a bit excessive.)
The options for creating an archive from a local repository are pretty simple. Here’s an example:
$ git archive --format=tar --prefix=sitepoint-git-advanced/ HEAD > sitepoint-git-advanced.tar | gzip > sitepoint-git-advanced.tar.gz
In the example above, I want the format of the archive to be TAR (you can also specify ZIP if you prefer). I’ve then specified a prefix value of “sitepoint-git-advanced”. If the trailing slash is not appended, then the name is given to all files in the archive. In this case, it’s only given to the entire archive.
Now let’s look at another example, one in which the output is a ZIP file. Instead of specifying the format and then piping the output to
gzip, we can use the
--output parameter and provide a name with .zip as the file extension.
$ git archive --output=sitepoint-git-advanced.zip --prefix=sitepoint-git-advanced/ -9 HEAD
git archive exports the repository to a ZIP archive for us automatically instead of sending the result to stdout. When using ZIP, you can also specify the compression level using switches
-9, which is handy if the resulting file would be rather large.
Archives from Remote Repos
Local repos aren’t the only type of repositories we can archive, though. With the
--remote switch, we are able to archive repositories from Gitosis and other Git servers our team may be using to store collective code.
The example below attempts to clone the Joind.In project. Unfortunately, to be honest, it doesn’t work (from searching forums I believe that it’s been deliberately disabled by Github to help them ensure a better user experience), but I’ve included it here to show you an example of the command.
$ git archive --remote=git://github.com/settermjd/joind.in.git --format=tar --prefix=joind.in/ HEAD | gzip > joindin.tar.gz
Now let’s get in to some of the more juicy aspects of working with Git. According to the Git SCM book, with the rebase command we can “take all the changes that were committed on one branch and replay them on another one.” In short, this allows us to merge branches and changes in a powerful and flexible way. It’s also here that a bit of a debate begins. If you’re familiar with other version control tools such as CVS, SVN, and Mercurial, then you may be used to a model where once a commit is made, that’s it – it can’t be changed. With Git, this isn’t necessarily the case.
Suppose the order of our commits wasn’t quite what it should have been. We’ve made a set of commits, but it would be more practical going in a sequential order. Perhaps they all relate to the same code module and there’s a change for an unrelated module in the middle. It’s not a show stopper, but it might be helpful to have them together when reviewing the history months later.
$ git rebase -i HEAD~5
Using the command above, I’ve started a rebase session covering the last five commits, displayed by
git rebase in reverse order:
pick da4fc8e changing default to add iteration pick 003ab61 adding even iteration script pick 701e132 reworking directory structure pick c722b9f removing old files pick 3a10e1c Added in files that were forgotten in the previous commit
Suppose commit 5 (3a10e1c) should be after commit 2 (003ab61). We can change the commit order in the editor as follows:
pick da4fc8e changing default to add iteration pick 003ab61 adding even iteration script pick 3a10e1c Added in files that were forgotten in the previous commit pick 701e132 reworking directory structure pick c722b9f removing old files
Save the buffer and exit the editor and we receive the following:
Successfully rebased and updated refs/heads/testing.
git log now produces the following output:
commit ab97ffb2fe87c25ca48116ed03cdbe486dc0abb5 Author: Matthew Setter Date: Fri Jan 18 12:54:00 2012 +0000 removing old files commit 3b5d6c5b0c6a866f426ff5bf412508ce9d9ae864 Author: Matthew Setter Date: Fri Jan 18 12:53:30 2012 +0000 reworking directory structure commit d9ab87677f8e5692b649b650a6de6ce836c72531 Author: Matthew Setter Date: Mon Jan 21 10:11:33 2012 +0100 Added in files that were forgotten in the previous commit commit 003ab6124b7f7a59be4fe152df28b9de706383ed Author: Matthew Setter Date: Fri Jan 18 12:44:08 2012 +0000 adding even iteration script
We can see by the new commit order that the re-order was successful. Git rewound back to the first commit specified, then replayed the history with the changes we’ve just specified.
Changing the Commit Message
What if you’re used to writing commit messages in a particular way that, for example, you learned from a previous job, but now you’re at a new organization and they write them differently? You’ve just committed a change and the senior developer, upon reviewing it, sees the message and wants it changed to comply with the company standard. With Git, you can go back and change the message.
First, we have to start the rebase process by issuing the following command:
$ git rebase -i HEAD~2
We’re requesting to rebase around two revisions prior to
HEAD. The terminal displays the following in the editing buffer:
pick c722b9f removing old files pick bcc1dff adding some files # Rebase 701e132..bcc1dff onto 701e132 # # Commands: # # p, pick = use commit # r, reword = use commit, but edit the commit message # e, edit = use commit, but stop for ammending # s, squash = use commit, but meld into previous commit # f, fixup = like "squash" but discard the commit's log message # x, exec = run command (the rest of the line) using shell # # If you remove a line here THAT COMMIT WILL BE LOST. # However, if you remove everything, the rebase will be aborted. #
We see the two commits at the top. In the list of commands below that, we have the option for
reword which allows us to change the commit message.
Here I’ve changed
pick c722b9f removing old files reword bcc1dff adding some files # Rebase 701e132..bcc1dff onto 701e132 # # Commands: # ...
Saving and exiting the buffer will then allow us to re-specify the commit message.
Merging a Set of Commits
You’re probably pretty diligent in your commit practices, ensuring commits only include a single change or a group of related changes for one purpose. But there’s always something that can upset our normal rhythm.
Let’s say we made a commit for a change, did some work and made another commit, and then did some further work and committed that. After the third commit, we discover that commits 1 and 3 really were related to the same update and should go together.
If we were working with SVN or CVS, that would be it; the revision history would be fixed. With Git, we can open up the history and merge commits 1 and 3 into a single commit prior to commit 2. Let’s go through the process.
We start off the rebase process looking at the last five commits which look as follows:
$ git rebase -i HEAD~5 pick da4fc8e changing default to add iteration pick 003ab61 adding even iteration script pick d9ab876 Added in files that were forgotten in the previous commit pick 3b5d6c5 reworking directory structure pick ab97ffb removing old files
We then change the order so that 3 comes before 2 as follows:
pick da4fc8e changing default to add iteration pick d9ab876 Added in files that were forgotten in the previous commit pick 003ab61 adding even iteration script pick 3b5d6c5 reworking directory structure pick ab97ffb removing old files
We then change
squash on the new commit number 2:
pick da4fc8e changing default to add iteration squash d9ab876 Added in files that were forgotten in the previous commit pick 003ab61 adding even iteration script pick 3b5d6c5 reworking directory structure pick ab97ffb removing old files
We’re then put into editor mode again, showing us the previous commit messages. We add in a new commit message that summarizes the two commits before they’re merged together. So, I change the following:
# This is a combination of 2 commits. # The first commit's message is: changing default to add iteration # This is the 2nd commit message: Added in files that were forgotten in the previous commit
With a new message:
# This is a combination of 2 commits. # The first commit's message is: Merging two changes together as they should be together.
After exiting the editor, the rebase process completes.
By starting, non-destructively, the rebase process again, we see that the two commits have been merged. We now have five commits, with the first being the one before the first one previously:
pick f5cac8d adding top documentation pick e50970f changing default to add iteration pick 3c2b3a5 adding even iteration script pick 3b136c1 reworking directory structure pick 01ad46f removing old files
Splitting a Set of Commits
But what if we want to go in the opposite direction? What there’s a developer who’s young and maybe a bit careless, or what if you, yourself, have been a bit undisciplined? I know I suffer from this from time to time.
Let’s say that we have a commit with a number of changes that we’ve just thrown in because it’s late or we’re in a hurry. Now we’re fresh and want to fix up the mess. How do we do that?
$ git commit -m "need to finish quickly" [testing e6a98bb] need to finish quickly 0 files changed create mode 100644 about.php create mode 100644 contact-us.php create mode 100644 default.php create mode 100644 login.php
In the example above, I’ve committed a set of files, three business module pages (
default.php) and a user module page (
login.php). Let’s separate them and do things properly.
Starting rebase with the last 3 commits, we see the following:
pick 3b136c1 reworking directory structure pick 01ad46f removing old files pick e6a98bb need to finish quickly
edit on the last commit (e6a98bb) and exit the editor to receive the following output:
Stopped at e6a98bb... need to finish quickly You can amend the commit now, with git commit --amend Once you are satisfied with your changes, run git rebase --continue
git reset HEAD^ we rewind the commit, unstaging the files. Now we are able to commit them properly, adding the business and user pages in two separate commits.
$ git add about.php contact-us.php default.php $ git commit -m "Adding new business pages" [detached HEAD a189c9b] Adding new business pages 0 files changed create mode 100644 about.php create mode 100644 contact-us.php create mode 100644 default.php $ git add login.php $ git commit -m "adding new user login page" [detached HEAD 9efcab9] adding new user login page 0 files changed create mode 100644 login.php
Then, finish up by running
git rebase with the
$ git rebase --continue Successfully rebased and updated refs/heads/testing.
A Word of Warning
Now to quote the oft quoted phrase:
With Great Power comes Great Responsibility
Please remember that just because you can do something, doesn’t mean that you should. Especially if you’re collaborating in a team environment, if you make these changes in your repository and then merge with the master repository, you’re likely to, amongst other things, cause confusion with your team. It’s fine if it’s there valid reason and it doesn’t happen too often. But if it happens repeatedly, I’m sure you won’t be all that popular for too long.
Also, you need to appreciate the potential conflicts that these types of changes may introduce into the code as well. While these features can be fun, interesting, and even exciting, be aware of the ramifications of what you’re doing.
I hope that after reading this article you see just how much power and flexibility Git provides, but yet that you also appreciate that with that power must also come an understanding of when and when not to apply it.
Happy Git-time, and be share your thoughts and ideas in the comments so we can all benefit. For more information, here’s some links for further reading:
Matthew Setter is a software developer, specialising in reliable, tested, and secure PHP code. He’s also the author of Mezzio Essentials (https://mezzioessentials.com) a comprehensive introduction to developing applications with PHP's Mezzio Framework.