Now Where did I Put that File?
In your travels on the Internet, you may have come across the acronym CVS, which is used with a kind of fanaticism by software developers who work on Open Source projects.
If you’ve been too shy to ask, CVS stands for Concurrent Versions System (not to be confused with CSV – the Comma Separated Value format for data files), and is a tool for controlling the version/revision of any type of file. The basic idea behind CVS is that there’s a central place (or repository) where all revisions of a file are stored, along with comments on what they are and what changed with each new version. The repository can then be accessed from more or less anywhere you like — whether it’s your PC at home, the one at the office, or by another developer working on the other side of the world. So CVS ensures that everyone involved with the project always works on the latest version of a file.
Why should this interest you? If you’ve recently started making your own Web pages or your first PHP scripts, just for fun, the idea of version/revision control may seem like overkill. But if you do anything more than dabble in Website building, eventually one or more of the following things will happen to you:
You work on two computers. You do a whole load of work on a Web page on the first and add it to your site. Then later, you do a whole load more work on the other computer, and add it to the site, only to discover that you started with an old version of the page on the second machine, so all the work you did earlier on the first computer is lost.
You make some major changes to a Web page and add it to your site, only to discover that there’s a major bug — your site grinds to a halt. Desperately you scramble for the previous working version of the page to no avail; you’ve over-written it. As if on cue, the phone starts ringing with the first of many angry customers…
You overhaul your site, making changes to more or less anything and everything, and it takes you three hours (during which time your site is down) to update your site with all the new files.
You start to collaborate with some other site developers and instantly work grinds to a halt as you spend 95% of your time just keeping up with all the changes other people are making.
An experiment with the
rm -r * command in Linux, or a slip of the mouse in Windows Explorer goes horribly wrong ,and you erase your entire site from your Web server. The copy on your own PC is out of date and your Web host says "Sure we’ll restore your site from the backup. Tomorrow."
CVS can help prevent any of these scenarios from happening. And even better news: it’s free (Open Source)! So whether you’re a graphics designer or writing the latest dB abstraction class in PHP, hopefully this article will help de-mystify CVS and show you how to get the most from it. CVS is a big subject so we’ll have to stay away from the fine print, to avoid getting bogged down, but you’ll find relevant links to further reading at all times.
So here’s tonight’s billing:
- Absolute beginners: the basic concepts behind CVS
- Hugging the tree: Getting down to business by checking out our first CVS repository
- SitePoint members hit Sourceforge: HarryF and Glenplake run amok with CVS in the Open Source community
- Wrap up: some final words and pointers to useful documentation, such as setting up your own CVS server on Windows (it’s not just *Nix types who can do it)
If you’ve worked in any distributed computing environment (and, as you’re reading this Web page, you have), you’ve probably come across version control in some form or other. It may be that you’ve tried to update a Word document from the central NT server at work, only to be told it’s available in read-only format because someone’s working on it. Or perhaps you’ve had to deal with "record locking" on a database and had some bad experiences where two of your site administrators updated the same article at the same time. One way or another, you’ve probably come across version/revision control in some form and wished you could find a better way to handle it. Well now you can: with CVS.
Essentially, CVS provides two functions: record keeping and collaboration in a manner designed to solve all your headaches. The first thing that makes CVS special is that it doesn’t lock records. Instead it keeps track of who’s doing what, allowing everyone to work in their own style, but watches for potential conflicts, where two people try to update the same record with their own individual revisions.
But before we go any further, it’s time to introduce some terminology we’ll need to use if this explanation is to make sense. You’ll see these terms used all over the Internet, wherever CVS is concerned.
Record: I’ll be using this here to refer to any "object" you’re working with, be it a .gif image, a Shockwave file, a PHP script, a Word document …or whatever you want.
Revision: a change to a record (or group or records), while "work is in progress". For instance, you create a PHP script. Later, you come back and make some changes to it, thereby creating the next revision of the script.
Repository: this is the "mother ship" of the project you’re working on. All your work is stored here and fully tracked with revision history, allowing you to check out the latest records at a moment’s notice. Say you get a new client, "Buildmysite Inc." who wants you to build their Website. You’d create a repository called "buildmysite" where all the HTML pages, images, PHP scripts, documentation, queries for creating their MySQL database would be stored.
Working copy: refers to any record that you (or another developer) are working on. A working copy has been checked out of the repository and sits on your own hard disk while you edit it with Photoshop, Ultradev, etc.
Check out: what you have to do to get a working copy of a record from the repository. Once your repository is created and all the work you’ve done has been stored in it, you begin every working day by checking out the latest revision so you can work on it.
Commit (aka. check in): you’ve been working on a record and have done enough for the time being (yes! Time for a coffee break!), you commit the record back into the repository so that the latest revision is stored there. Another developer, who later checks out the record gets the latest revision to work with.
Log message: the comment you supply every time you commit a record; usually a sentence or two that describes what you’ve done. This log message is then available for general viewing, so that everyone can see what’s changed.
Update: this updates your working copy with all the latest changes from the repository. For example, you start your day by checking out a newly created record to your computer, and work solidly on the HTML template till twelve noon. Your good buddy Bob then calls to say he’s made some changes to the CSS file and committed them to the repository, and he thinks Janet might also have altered some of the images. So you quickly perform an update to bring your working copy in line with everyone else’s work. Once you’ve seen how your HTML template looks with all the new work, you check it in to the repository.
Conflict: let’s say that Bob and Janet check out the same PHP script, unaware that the other is also working on the same file simultaneously. After making some changes, Janet commits her work to the repository; no problem. Bob then tries to commit his changes, but is about to overwrite the work Janet has done. CVS spots the problem and warns Bob of conflicts. It’s then down to Bob and Janet to work out the best way to combine both their work. That may seem like an odd way to handle this situatio but it’s one of the things that makes CVS powerful. On big projects with many developers, or projects where the developers must collaborate over the Internet, this approach allows for far more flexibility.
Also, if the record you’re working with is in text form (such as a Web page or a PHP script), it’s possible for two people to work on the same record in different sections (without conflicting with on another). They can then commit their respective changes to the repository, where CVS will merge the second part with the first, to produce the latest whole revision.
Tree: Refers to everything stored under CVS. There could be multiple repositories stored on a given CVS server. The entire structure is referred to as a tree (like any directory structure on a hard disk).
You may have already realised that CVS operates on a "client-server" basis. For instance, you have a server where your CVS repository is stored (this is usually a machine with plenty of disk space and a fast network connection, which could be running some flavour of Unix like Linux, or a version of Windows NT). You access the server over the network using a client, which you run on your own workstation (there’s client support for more or less every operating system: Unix, Windows and Macintosh). This makes CVS extremely powerful — if your CVS server is connected to the Internet, you can check out the latest version of your project to anywhere in the world!
OK – they’re the basic terms and concepts you’ll need to get into CVS. Don’t panic! You need to know what they are, but we’ll be showing you CVS in action in a moment, which should make things a bit clearer.
Hugging the Tree
Right. Enough talk: it’s time to throw you right in at the deep end! We’re going to install a CVS client and check out a repository to our hard disk. For this example you’ll need to be running any version of Windows.
The Big Install
CVS is sounding pretty serious right? So you imagine it’s gonna be really hard to install the client? Well, brace yourself…
- Head to cvshome Windows downloads and click on the Win32 link (below the word "Platform"). Save the ZIP somewhere on your hard disk.
- - Now extract the file (cvs.exe) to your Windows "system32" directory (usually c:windowssystem32)
That’s it: installation over! Well… almost. We’ve installed the DOS command line CVS client, which gives you a simple interface to CVS that you can use straight away.
But let’s not stop there — seize the moment and check out your first CVS repository! We’re going to check out the PHP source code for phpBB2. phpBB is an Open Source project hosted at Sourceforge, as your can see here. And because it’s Sourceforge, everyone has anonymous (read only) access to the projects CVS tree. The details of the phpBB tree can be found here, and if you click on "Browse CVS Repository" you’ll see there are in fact two repositories: phpBB for the old version 1 of the code, and phpBB2 for the latest version. So let’s (literally) check it out…
Open up an MS DOS prompt (usually somewhere on your start menu, perhaps in "Accessories") and type the following. Note that you’ll need to be connected to the Internet for this to work:
cvs -d:pserver:email@example.com:/cvsroot/phpbb login
cvs -z3 -d:pserver:firstname.lastname@example.org:/cvsroot/phpbb
Here’s what we just did, line by line:
- hange [d]irectory to the "root" of your c: drive
- [m]a[k]e a [dir]ectory called cvs_root
- Change to the cvs_root directory
- Login to the phpBB cvs rver – just press return when it asks you for a password
- Check out (co) the phpBB2 code – this is cAsE sEnSiTive, so be careful!
A list of files appears — these are being copied to your hard disk. Each entry in the list looks something like this:
The U stands for "Updating" and what that’s saying is that it’s checking out the file from the phpBB2 repository, and as a quick examination of your hard disk revealed no existing version of common.php, it’s creating one in c:cvs_rootphpBB2
Now if you look in the c:cvs_root (you may prefer to do this part in Windows Explorer) you’ll find a subdirectory called phpBB2 that contains all the latest phpBB2 code. Congratulations! You’ve just checked out your first CVS repository!
Now, for the next trick. Using your preferred text editor, open the file c:cvs_rootphpBB2common.php and right at the top of the file enter "This is a test", then save the file under the same name, and close it. Also, just for fun, delete the file c:cvs_rootphpBB2install.php
Next, back at your MS DOS prompt, type this from the c:cvs_root directory (type cd if you’re uncertain where you are):
cvs -z3 -d:pserver:email@example.com:/
cvsroot/phpbb update phpBB2
This time you’ll see more or less the same thing as you did before when you checked out the repository, but with some subtle differences.
The M says the file CVS found on your hard disk is different to the one in the CVS repository, so it will be left untouched.
CVS couldn’t find install.php so it reports that it’s creating one.
Then the rest looks like this:
cvs server: Updating phpBB/admin
This tells you that the file found on your hard disk was the same as the revision in the CVS repository.
No go to your text editor again, and open up c:cvs_rootphpBB2common.php — is your little message still at the top of the file? Yes it is!
Then in Windows Explorer (or otherwise) and look in the c:cvs_rootphpBB2 directory — can you see install.php? Yes you can. It’s back, even though you deleted it.
That’s it! You just performed your first update! Now you’ve seen it in action, remember the "direction" in which updates work. "Update" is for updating your own working copy of a project on your own computer, from the CVS repository. To "update" the repository itself with work you’ve done on your computer, you perform a commit, which we’ll look into later.
Starting to get a warm feeling yet?
Even with anonymous read-only access, we’ve got a very powerful tool at our disposal. Take the phpBB project, for example. For a standard installation you’d have only updated the file phpBB2/config.php with the local settings for your MySQL server etc. But, thanks to CVS, you can keep pace with the latest phpBB2 code using the update we did above, while keeping your local settings in config.php in tact. You can install phpBB2 on your site and update it on a regular basis, without ever having to bat an eyelid.
Translate that to updating your live Website. You and a team of developers perform a major overhaul of your site, changing hundreds of files. You do all the work away from your live site, leaving the old work there untouched. Once you’re entirely happy with the new version, you login to your live Website and use CVS to update it with all the new work in a matter of minutes — the major overhaul took your live site down for no more than five minutes! So see what a typical site owner thought about the ease and simplicity of using CVS on their Website, try this article.
The CVS client we installed was the command line version for DOS. But if you’re not into command lines, don’t despair! There’s a GUI version called WinCVS (or sometimes cvsgui), which you can download here for Windows, Macintosh and GTK (GTK is for those Linux fiends — though of course they probaly already know all about CVS). Installing the WinCVS client can be a little more tricky, so I’ll leave the explanation of installation and usage to the documentation.
CVS has a number of authentication mechanisms. In the above example, we used the pserver method (Password Authentication Server). This is fine for anonymous access, but, like ftp, it sends your username and password in clear text over the Internet — anyone with a sniffer who happens to be listening can steal them. For security’s sake, you’ll generally use ssh (secure shell) to access a CVS repository over the Internet for anything more serious than read-only access. Windows and Mac users should take look at the WinCVS-SSH-Guide for more about this.
SitePoint Members Hit Sourceforge
This story is 100% true and the names have not been changed to protect the guilty!
Once upon a time, a PHP coder known in some circles as HarryF decided to write a PHP application (the name of which will be made available in some far-flung future when it’s finally finished!). While he was frantically coding his creation, another coder and Master of Photoshop, known to some as Glenplake, kindly offered to help HarryF design the user interface for his application.
So the two began to collaborate with much enthusiasm, but it soon became apparent that all was not well. HarryF’s weekly ZIP files containing his latest work were getting confused with Glenplake’s updates, and a great distance over land and sea separated the two, which only added to the chaos. Hair loss ensued.
Luckily, in the Sacred Mountains of Open Source there stood the Temple of Sourceforge, from which all good things do issue. Following the The Ritual of Registering for a New Project, our two hardy PHP coders where able to gain access to the Fountain of CVS and thereby finally collaborate successfully. Things got back on track, the project steamed ahead, and the coders kept what remained of their hair.
Sourceforge requires us developers to access it using the ssh method. If you run Windows, it’s worth setting up some environment variables to make life easier, so before we go any further:
- Download ssh from: http://download.sourceforge.net/sfsetup/. The file will be named something like ssh-1.2.14-win32bin.zip
- Unzip to a directory: C:program filesssh
- Now open an MS DOS prompt and type the following to create the directory c:program filesssh.ssh (Windows Explorer won’t let you make directories beginning with a full-stop (or period)):
- Update your system Environment Variables:
- Right-Click "My Computer" and select "Properties"
- Click the "Advanced" tab
- Click "Environmental Variables"
- Select the "path" variable (under system variables) with your mouse and click "Edit"
- At the end of the box you're now editing, add "c:progra~1ssh" with a semi colon before it i.e. ";c:progra~1ssh"
- Click "New System Variable"
- Name the new variable "HOME"
- Set the variable value to: "C:progra~1ssh"
- Click "OK"
- Click "New System Variable"
- Name the new variable "CVS_RSH"
- Set the variable value to: SSH
- Click "OK"
- Click "New System Variable"
- Name the new variable "CVSROOT"
- Set the variable value to: ":ext:firstname.lastname@example.org:/cvsroot/yourproject"
- Click "OK"
Assuming that you've already installed cvs.exe, you can now access your Sourceforge CVS tree.
Back in the times when HarryF and Glenplake took on the trying task of international collaboration, HarryF set up his environment variables just like this. And once that was done, he took all the work he'd completed so far and imported it into CVS like this:
cvs import the_project vendor start
Here's what he did, line by line:
- He changed to the directory where the project he wanted to add to CVS was located.
- He imported all the work from that directory -- and everything below it -- to the CVS repository "the_project". He had to use "vendor start" unquestioningly at this point (read more about it here).
Once it was imported into the repository, HarryF renamed the directory "c:MyWorkthe_project" to "c:MyWorkthe_project_old" so that it wouldn't be lost in the check out of his first copy of the new repository:
cvs co the_project
And as if by magic, the_project was restored to its original location! Notice that there are some extra directories within each subdirectory of the project, named CVS. These directories contain local information that CVS uses to keep track of your work -- don't mess with them unless you know what you're doing.
HarryF then discovered one of CVS's darker secrets: CVS doesn't handle binary files very well, including executables (.exe), images (.gif,.png,.jpg), and Shockwave (.swf) files etc. CVS assumes by default that all files are text files, and attempts to place things like revision code in them, which tends to have a disastrous effect on binary files. All of the .gif images HarryF had made and imported into the CVS repository were now scrambled and useless. What could he do?
Well, in general, before you import your project for the first time, it's a good idea to set up a cvswrappers file to tell CVS which file types are binaries. Alternatively, when you add new binaries to an existing project, you can also use "cvs add -kb myfile.gif" to tell CVS that the file is a binary.
HarryF discovered this through feverish consultation of the manual on cvs admin and how to recover from this disaster. He implemented it and -- hey-presto! -- his binary files were restored to their former glory.
Finally happy with the CVS repository, HarryF emailed Glenplake to tell him that all was well, and that it was time to merge the work he'd done with the main code. HarryF then decided he needed a break, and went out to start yet another argument with the mechanic who was supposed to be fixing his motorbike.
In the meantime, Glenplake, being one of those experienced Linux types (as well as a Photoshop pro -- his talents were beyond compare!), logged into his Linux Web server and quickly set up the environment for use with their new CVS tree.
Most common Linux distributions already have ssh and the cvs client installed (Glenplake's certainly did) so you can set up the environment as we did in Windows, using the following commands:
Glenplake, being even more clever than that, edited the file .bash_profile in his Linux home directory and added:
This meant that every time he logged in, he would have the environment variables set correctly and not need to re-type them.
Now, moving to the Web root directory on his server with:
...he renamed the version of the code he had installed there:
mv the_project the_project_old
Next, he grabbed the latest version from the repository with:
cvs -z3 co the_project
Success! Glenplake now had a working copy of the repository.
He swiftly copied the work he'd done from the old directory to the new working copy:
cp the_project_old/updated_file.php the_project/
cp the_project_old/new_image.gif the_project/
Here's what those commands did:
- Copied one of HarryF's files that Glenplake had updated to the working directory
- Copies a new image Glenplake had created to the working directory
Then he moved into the working directory:
...and introduced his work to the CVS repository. For the file he updated he used:
cvs commit -m "Fixed some bugs" updated_file.php
...to commit the file to the repository. He used the -m "Fixed some bugs" to add a log message to updated_file.php's revision history, so that later, both developers would be able to see what had happened to the file.
Then, for the new file he created, Glenplake typed:
cvs add -kb new_image.gif
cvs commit -m "The project logo" new_image.gif
...to add the new image to the repository. "cvs add" tells CVS to schedule this file to be added to the repository next time a commit is performed. The "-kb" tells CVS that the file is binary, and must be handled with care. "cvs commit -m "The project logo" new_image.gif" actually commits the new image to the repository, with the log message "The project logo".
On Windows, Glenplake could also have happily performed the same cvs commands using the "cvs.exe" client.
Now that he was finished, Glenplake went out to steal some bagels from his local bakery.
Meanwhile, in a land far, far away, HarryF returned home and got back to work on the_project.
Because he'd been paying attention to the documentation, he knew it was a good idea to begin with a "cvs update", to check that he had the latest version of the repository. So he typed:
...and CVS responded with:
CVS checked to see if HarryF had the latest version of the repository. It then found the updates Glenplake had made and downloaded them into HarryF's working copy, giving him the latest files to work with.
And so Glenplake and HarryF carried on, exceeding all their private expectations and completing a project of quite some magnitude.
If you'll forgive the poor prose, this last example should give you an idea of how CVS works, particularly when you're collaborating with someone else. This is only a simple demonstration of what CVS is capable of and I'm sure there's no need to tell you it can do loads more for you.
For starters, we haven't really looked at how CVS tracks revisions and how you can view the logs, view diffs (differences between one revision and another), restore older revisions of your work, resolve conflicts, and a whole host of other things. But with what you now know, you have enough to get started with CVS. As you delve further into it, you'll find making sense of the documentation a lot easier, and quickly pick up what you need.
Perhaps the best place to start is the CVS Book which -- though long -- takes time to explain concepts clearly. If you just take a quick glance at the contents page, you'll get an idea of some of the more advanced things you can do with CVS. Alternatively, you might like to try CVS Version Control for Web Site Projects which covers less ground, but is geared more to Web builders like us. The "official" documentation can be found either at http://www.cvshome.org or here. And of course, if you don't like any of these resources, a quick search on Google will find many alternative explanations, tutorials and guides. You are most definitely not alone with CVS.
CVS Server for Windows
As we mentioned at the start of this article, CVS isn't just for the *Nix types. It's widely used and, as you've already begun to see, well supported. It's also popular in many companies where it's typically used in software development. Without doubt, any skills you can develop with CVS are good news for your resume.
So, Windows users, wander over to http://www.cvsnt.org/ and grab yourself a copy. You'll find alternative step by step installation guides here and here. The CVSNT server is confirmed as being able to run on Windows NT4 Server/Workstation with Service Pack 6, Windows 2000 Server/Professional and Windows XP-Pro. There's also a cut down version able to run on Windows 95/98, so all in all, you should be able to get a server up somewhere.
Viewing CVS Online
Finally, there are also Web-based CVS viewers available that allow you to browse your CVS tree online -- very nice for catching up on what the tree contains, and what changes have been made. Three examples are Chora, written in PHP, ViewCVS, written in Python and cvsweb, written in Perl.
So, go forth boldly into the brave new world of CVS, and may all your revisions become releases!
The author would like to thank to Dan, who was subjected to numerous bizarre experiments in the writing this of article.