Track Your Hacks with CVS

The following is republished from the Tech Times #130.

Quite by coincidence, three times in the past week I have had to hack the code of some open source software that went into the site I was working on. First I had to modify phpBB to include an embedded calendar on the home page of a private forum I administer. Next I made some custom tweaks to the code of the K2 theme for WordPress. Finally, I had to hack phpAdsNew to produce XHTML Strict output.

In each case, the hack required me to actually modify the code of the software. Obviously I prefer not to do this, because when the next release comes along the updated files will overwrite my hacks, and I’ll need to implement them all over again.

Normally I’d just document my hacks someplace and grumble about the lack of customization features in the software, but three times in a week was too much. Let me show you how I solved the problem using a common development tool in an unconventional way!

CVS (Concurrent Versions System) is a system for tracking changes made to files in a project over time, potentially by multiple developers, each working on his or her own copy of the project files at the same time. As it turns out, it’s extremely useful for managing the custom hacks you make to open source software.

In the past few years, Subversion has sprung up as an alternative to CVS that eliminates some of the headaches in CVS to do with things like moving or renaming files. Since such changes don’t usually happen when you’re hacking an existing script, and since SitePoint already has a decent introduction to CVS, I’ll stick with CVS for this discussion. If you know Subversion, you can use it instead.

Setting Up

You first need to create a CVS repository for yourself (if you don’t already have one). Because I use Windows, I set up CVSNT to do this. If you’re on Linux, you can use the original CVS software. You’ll also want to get an easy-to-use client program (I recommend SmartCVS), unless you particularly like working from the command prompt, in which case I’ve included all the commands below.

When your CVS server is set up, store a “clean” copy of the software version that you have hacked for use on your site as a new module (or project) in the repository (e.g. cvs import phpBB2 phpBB2 init_ver). Tag this “clean” version in the repository to indicate the version number of the software it represents (e.g. cvs tag release-2-0-17 .).

Immediately create a branch in the repository (e.g. cvs tag -b custom-mods-branch .) from this initial version, and then check out a copy of the files from the branch into a convenient working directory (e.g. cvs checkout -r custom-mods-branch phpBB2). This copy is where you’ll keep track of your hacks.

Copy your site’s (hacked) copy of the software’s files on top of the “clean” copy you just checked out from the branch, and then perform a CVS update to identify the files that have been modified with hacks (e.g. cvs update .). Review these changes to make sure they are all wanted.

Review hacks in your CVS client
Fig 1. See hacks as changes in the working copy

You can now track changes to your hacks as you make them in this branch. Simply hack the files in this working copy of the branch to your liking and commit your changes to the repository. To update your site with these hacks, delete everything in the destination directory and then export the latest version of your branch files to your site (e.g. cvs export -D /home/www/htdocs phpBB2).

Merging with New Releases

When a new version of the software comes out, extract its files into the “clean” copy you made at the beginning, commit all the changes to CVS, and then tag the updated files for the release (e.g. cvs tag release-2-0-19 .). These updates will be stored into the trunk of your repository, so they won’t affect your hacked version (which is tracked in the branch).

Now, here’s the payoff: to update your hacked version of the software with the changes in the latest official release, just go to your working copy of the branch and merge in all the changes from the trunk (e.g.
cvs update -j release-2-0-17 -j release-2-0-19). Your hacked files should be updated seamlessly with changes that were made in the official update(s).

Files where your hacks occurred close to or on the same line as a change in an official release will report conflicts when you perform the merge. You’ll have to open these files and resolve the conflicts yourself (CVS will helpfully include both versions of the code at the point of conflict) before committing the corrected versions to the branch. You should then set a tag on the branch to indicate where you did the merge (e.g. cvs tag merge-2-0-19 .).

In effect, CVS will perform all the updates that it can for you, and then will pick out those updates that appear to interfere with your hacks so that you can deal with them. If that isn’t useful, I don’t know what is!

A typical CVS tree for hack management
Fig 2. The CVS tree showing merged releases.

The next time an official update comes out, you can perform another merge, but remember to update the starting tag for the merge so you only get the changes since the last time you did a merge (e.g. cvs update -j release-2-0-19 -j release-2-0-21).

A Note on the Vendor Branch

The CVS gurus in the audience may be up in arms at this point. In fact, CVS automatically creates a vendor branch for every module you check into it. You can import each new version of the software into this special branch, and then merge the changes from the branch into your (hacked) trunk.

So why didn’t I use this? The truth is, it’s just as easy to do all this using a normal branch as I described above, and it doesn’t require you to learn all about vendor branches and their pitfalls. Also, you can do it in simpler CVS clients that don’t support vendor branches (like the free version of SmartCVS).