Using SVN for Web Development

As our web applications grew and more and more developers started working on them, it became obvious that we needed some kind revision control system to manage our code. As CVS is quite dated and Subversion (SVN) introduced some handy features (atomic transactions, Apache piggybacking, more convenient branching/tagging, tons of other improvements), we chose to go with SVN. The big question was: how to use it correctly? After coming up with some more or less weird ideas I think we finally found a decent solution to put a web application into source control. Keep in mind that this post is not about how to use SVN itself, it’s rather about the system architecture for using it in web development. If you need help with Subversion, you can find it in the manual and the FAQ.

Managing web applications in SVN is tricky for some reasons:

  • When using revision control, a programmer is always working on a ‘working copy’ of the project. In traditional software engineering, this copy is somewhere on the his machine, as it’s a stand-alone application. In web development, however, we’re talking about webspace. Should every developer have a PHP environment on his machine then? Shouldn’t all programmers work on the exact same server configuration? What about Windows and Mac users?
  • This means the working copies should be best on one webserver, along with an SVN client. Thus, we need some interface (the most simple one being SSH) to access it. Maybe a rich web client would be even better.
  • Most web applications use a database. Different versions of software need different versions of database designs. Even worse, their operabililty may depend on data in the database. An application version without a matching database structure is no good.
  • The application needs to be deployed to a live webspace. This might happen quite often and should be as painless as possible.

So this is what we’re going to need to make it work:

  • A public webserver with the live webspace
  • A testing webserver with at least one webspace (working copy) for each developer
  • A MySQL server with two databases
  • A SVN server
  • Some discipline

Webservers

Each developer is now able to work on the project in his own working directory (webspace). They all share one development database though, as the code shouldn’t be dependant on data in the database. Whenever a programmer has finished his task, he commits his changes to the repository.

When the next version of your application is ready, it can be rolled out to the live server, which is a working copy as well. That way, updates can be released very quickly and efficiently without having to export the whole repository. All we need is a little script or webmin to trigger the rollout. For security reasons, there is no other way to access files on the live webspace. All that needs to be done yet is to prevent Apache from descending into .svn folders. A little directive in our httpd.conf can help us here.

# Disallow browsing of Subversion working copy administrative dirs.
<directorymatch "^/.*/.svn/">
    Order deny,allow
    Deny from all
</directorymatch>

Databases

For each version, there is a matching database structure. How are we going to handle this? First of all it seems smart to keep things backwards compatible, as it’s important that you can roll back to a previous version without breaking your application. Which means: don’t delete anything anywhere, just extend existing structures.

On each software release, all changes to the database structure need to be made on the live database server. On first thought this should be done automagically. The problem is: how? We don’t just want to “replicate” all the development server’s differences to the live server because there are certainly test tables or other half-baked things around there. Which makes it necessary to carefully select and that’s painful.

That’s why you should keep a changelog, where you can append all changes to the database which are meant for live use. It takes some discipline but has the advantage that you always have a ready SQL file to build your database from. If properly commented, you can use it as a changelog, too.

What if, for some odd reason, your code is dependant on data in the database? Say, the there’s some sort of sitemap stored in the database. This is painful because you have to update that table manually. Before each commit, when there’s been changes, you’ll have to dump it to an SQL file to put it under revision control. This way your data becomes mergeable, too. When there’s a lot of places to do so, you should consider scripts for freezing (writing database data into files) and unfreezing (vice versa) your web application.

What to put in…

Of course, your code belongs into version control. So do database schemata, templates and graphics. What you shouldn’t put in, is:

  • User data – user uploaded files and stuff don’t belong there.
  • Database data – don’t confuse revision control with a backup!
  • Cached data – it’s temporary data.
  • Configuration files – they’re to be created manually.

Don’t even think about putting configuration files into revision control. First of all, they tend to get overwritten and, for example, change your live server’s database. Second, passwords don’t belong into revision control! Instead, you should add default config files with changed filenames, like database.conf.default.

Branching and tagging

It’s important to remember that only working code belongs in the trunk. After all, it needs to be ready for deployment all the time. Because of this you need to create a new branch, whenever:

  • You’re working on a big project that requires its own versioning.
  • It’s a project that takes a rather long time to complete. Your code is safer in the repository than on a developer’s laptop.
  • When more than one person is working on a project. In their own branch, developers can share their code without damaging the trunk.

These branches are merged back into the trunk after completion. About tagging, the only thing I can say is that it’s smart to create a tag whenever you’re updating your live webserver to keep track of your releases.

Security

A nice side-effect of this whole approach is that no developer needs access to the live server anymore. When done right, live server updates are triggered over a webmin interface and that’s about all administrative access you need on that machine. Which is good. And don’t forget to use SSL for your SVN transactions to be safe!

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • http://www.phpism.net Maarten Manders

    Thanks to my colleague Franky for making this up with me. ;)

  • pallan

    This is an article that could not have come at a more perfect time. My only question is what would be you recommendation if you don’t have webmin, but still want the developers to have a easy way to sync changes to the live site? Peer

  • JMF

    Why don’t you use the “export” command instead of the “checkout” one for the live webspace ?

    That would be cleaner and would avoid the need for the trick to hide .svn directories.

  • http://www.phpism.net Maarten Manders

    My only question is what would be you recommendation if you don’t have webmin, but still want the developers to have a easy way to sync changes to the live site? Peer

    The syncing works like this: As the live server is a working copy of your project, you can just call a SVN update to bring it up to date.

    I took a look at WebSVN but it’s horribly slow, as it uses the command line SVN client. Instead, you could take a look at Alan and Wez’ libsvn bindings. At the moment, we’re triggering live server updates over SSH. In addition to that, we’re working on a little tool which allows us to do those updates and a little bit more (like rollbacks, logging, etc.) over a secure SSL website.

    Why don’t you use the “export” command instead of the “checkout” one for the live webspace ?

    An export seems nicer, yes. The httpd.conf directive may slow our apache down a bit, too. But we’re dealing with thousands of files and quite some data volume here which makes exports inefficient and take several minutes. We’d like to be able to update frequently though.

  • pallan

    …At the moment, we’re triggering live server updates over SSH…

    I assume this means you are logging into the command line via SSH to run the svn update command then?

  • http://www.phpism.net Maarten Manders

    I assume this means you are logging into the command line via SSH to run the svn update command then?

    Exactly. Sorry for being ambiguous. :)

  • http://fairsky.us/home Joshua Paine

    An export seems nicer, yes. The httpd.conf directive may slow our apache down a bit, too. But we’re dealing with thousands of files and quite some data volume here which makes exports inefficient and take several minutes.

    You can export to a tmp directory (on any machine, really) and then rsync the exported files into the web root. Or if that’s still too slow, maintain a working copy somewhere outside the webroot and use rsync to copy the files to the webroot, telling it to ignore .svn files.

  • http://www.peterbailey.net beetle

    I work at a large online marketing/advertising agency – you might have heard of us since we made AdWeek’s Global Interactive Agency of the Year for 2005 – and we use SVN for all of our development.

    First of all, our server environment is a WebDAV setup with Dev/Staging/Live for each client and/or subdomain. We checkout repositories to our local machines, edit, and then post to the DEV server. Changes are then committed back into the trunk.

    Once code makes it past staging (client-review) it then goes live, and a branch is created to coincide with that “live version” of the site.

    I don’t have the space to fully describe every nuance of the system, but that’s a general overivew of how we do it.

  • s21825

    I use a post-commit hook script to export the trunk to the live dir.

  • http://www.lopsica.com BerislavLopac

    Thus, we need some interface (the most simple one being SSH) to access it. Maybe a rich web client would be even better.

    Actually, in my experience Samba shares seem as the best access interface. Windows users can then use the excellent TortoiseSVN client on the shares mapped as network drives, while Linux users may mount shares as any other disks.

    If that fails for some reason, the good old FTP is still an option, and if all else fails SFTP with a GUI client still beats the command line (except for the die hard CLI fans who develop in joe nevertheless :) ).

  • soenke

    Thx for this article. I thought of something like linking the svn server with the PEAR packager. If a version tag is done in SVN, some script could automagically build pear packages (if all tests were passed).

    So the live server(s) just could make a `pear upgrade-all’ on our own pear channel and that’s it. With PEAR, we can use cross-package-dependency-checking (wh00t new word-creation) as well as very easy up- and downgrading.

    Does anyone have experiences with an environment like this?

    Thx.

  • soenke

    Addition: How do you handle include_pathes with more than one dev-pathes?

  • http://zaskoda.com Zaskoda

    Perfect timing on this article. It does seem, however, there’s not an automated elegant solution for keeping the database up to clean revisions. This seems like the kind of problem someone would have addressed by now. Perhaps this is a good space for a new innovative product?

  • http://www.lopsica.com BerislavLopac

    It does seem, however, there’s not an automated elegant solution for keeping the database up to clean revisions.

    Daversy is a try in that direction. I have had a discussion with Ely Golovinsky, Daversy developer, about an automated system built around SQL dump scripts and a traditional versioning system.

    Also, if you’re using MySQL, the DBDesigner4 (recently acquired by MySQL) stores its data in XML files which can be easily versioned, and it has a synch feature which makes it simple to modify the database.

  • shea

    That Daversy looks interesting Berislav, cheers!

  • http://www.dvdverdict.com/ mjackson42

    How do you handle include_pathes with more than one dev-pathes?

    Personally, I’d do it something like this:

    define('INCLUDEPATH',dirname(__FILE__) . '/inc');

    That way you wouldn’t have to worry about the path changing.

  • Corigo

    This is very nice for PHP, Apache, etc. but not applicable in Java where files need to be compiled and the Java server restarted. In the Java environment it is still necessary to have each developer have their own server on their development machine so they can restart the Java server without interrupting other developers work.

  • Jason Stirk

    We are currently trialling out SVN with a continuous integration system – the idea being that every time you commit code, a test server checks out the code, runs the tests and if it’s all good, can tell the live server to update it’s code base.

    We are spoilt though, as our current test projects are using Ruby on Rails and we can take advantage of the new Migrations support. I’d be surprised if someone hadn’t written a PHP-ish (or even language agnostic) approach to this – a friend of mine wrote and tested a MS Access variant of the system in only a few days, so I would suspect the PHP variant would be even quicker to write.

  • Pingback: links for 2006-02-08 at Negative Horizon

  • http://www.lopsica.com BerislavLopac

    I’d be surprised if someone hadn’t written a PHP-ish (or even language agnostic) approach to this

    I don’t think there is a complete equivalent, but ADOdb comes quite close, I think.

  • Pingback: rolfs hjørne » Blog Archive » Om SVN og webudvikling

  • Ryan B

    Great timing, I too am looking into using SVN for our website.

    You mentioned you create a new tag for each update to the live web server. Why is that? Wouldn’t the live web server get updated with pretty much every commit made to the trunk (live) directory of the repository? That right there keeps track of the releases.

  • http://www.lopsica.com BerislavLopac

    Wouldn’t the live web server get updated with pretty much every commit made to the trunk (live) directory of the repository?

    Of course not, because live server is not the same as your repository. As mentioned above, you either export or checkout/update to the Web server, just as you would with any other working copy.

  • Ryan B

    Sorry I didn’t make myself clear, I was assuming the update of the live workspace would happen automatically through a post-commit hook or shortly after the commit was made (after a final check). But, I can understand using tags if the updating process happens on a rarer occassion (say once a week after several commits have built up).

  • http://www.phpism.net Maarten Manders

    People tend to see post-commit-hooks as a golden hammer to make life more comfortable than it needs to be. I don’t think it’s smart to update the live server on each commit – I’d like to control that myself: in some sort of webmin, which is the same place to do rollbacks, see who did the last updates, etc.

  • Jason Stirk

    Maarten: I’m inclined to agree, especially if you aren’t running some sort of test suite on each commit – the last thing you want is for some code to be rolled out onto a production environment in an incomplete state. It’s too easy to make mistakes like forgetting to add that new file into the repository before committing, or other “minor” things which may completely break the production environment.

  • http://www.silentflute.co.uk worchyld

    Do you put the code under the sentence “A little directive in our httpd.conf can help us here.” in your testing server or live web server?

    Is SVN meant to be put on the live server or test server or both?

  • http://www.whitelionsoft.com veslach

    I try to do a checkin at the end of the day (or before I get started in the morning) whether or not the code is in a currently working condition. This should limit the amount of code loss in case of hard drive, filesystem, or OS failure (or programmer error). I also do a checkin once I have something working or before I modify a directory structure.

    We tag a release that then gets moved to a testing server (haven’t automated this yet). Once QA process is done, the “bug free” version of the tag gets tagged as a live version, is moved to the live server, & the modifications get merged into the trunk.

    We’ve set up a directory structure to keep include files out of the “wild”. Under trunk there’s an html, an includes, & a libs directory. The html directory is the site root & the libs directory is a collection of svn-externals.

    Considering Track Your Hacks with CVS, does anyone have any suggestions for tracking vendor code?

  • Pingback: Rational Exuberance » Blog Archive » Using SVN for Web Development

  • Kyle Mulka

    Hmm… that code for denying .svn directories doesn’t seem to work at all for me. Is there a specific place in the httpd.conf file you need to put it. I did restart the web server after editing the conf file.

  • Johnny

    cool site by the way =)

  • paul

    we use SVN at nurvex.co.uk website development and its well worth it, I really reccomend it

  • Jan Pekar

    Nice article,
    we are using SVN the same way for web development.
    For keeping SQL on 2 databases (live+testing) – every developper must create SQL patch which will change the database to another release. This patch is stored in subversion. Developper must notify (for example in Bugtrack(for us Mantis)) that when admin is updating live website, he must execute SQL patch in sql/ directory named (2006-XX-YY_descr.sql).

    —-
    Now I’m thinking about autocheckout webmin (I hope I will create soon)
    - authorization (via one password or LDAP)
    - last list of svn log
    - display current checkouted version on site
    - choose version to checkout (or export)
    - callable from post-commit (wget …) so you can keep testing sites allways accurate

    and

    webmin which will create all infrastructure for new projects at once
    - repository for developpers, testing
    - add users into LDAP project group
    - create databases
    - create websvn
    - create wiki page for project
    - create project in bugtrack, in timesheet program

    If I’ll succeed I can share of coure, but time is my enemy.

    One question – Is it possible to save into file in repository the current version of repository?
    I was thinking about $Id$ tag, but it is changed only when file is modified.. is any way how modify with every commit the special “version” file? Is any other way how to do it another way?
    It is important to know, which revision are you running when you are exporting to live site.

  • john

    Question regarding log messages for web development projects. When changes are made minor (textual, links, etc) do you add a SVN msg for textual changes? I also use a PHP comment history at the top of each page. I find that my page comments are similar to the SVN log comments. How do most people handle this? Examples or links, specific to web site/development, would be appreciated.

  • Ax

    Thanks for the post! I like the ‘remote checkout’ idea and I think I’ll use it in the future.

    About the rest:
    I don’t agree that test data doesn’t belong in repository – it just doesn’t belong on production server.
    I’m actually dumping the testing database every time I do a commit and import it back to database right after checkout (automatically through script, of course).

    This means that I can take a fresh laptop (with LAMP installed of course), checkout one of the projects and work on it without connection to some central server. Which means I can work wherever I want without hussle.
    Of course, I have to install PHP, Apache & MySQL locally – but that’s not really difficult to do.

    The same goes for configuration files – of course they belong in SVN, where else would you keep them?

    I agree with cache files though, they are easily recreated. ;)

    I try to keep things simple, and one of the most important things for me is to be able to:
    - get the whole project with a single command
    - work remotely, disconnected from Internet

    Your goals were (obviously) different, but there is no right or wrong way. So – keep an open mind. There are many ways to skin a cat. :)

    Happy coding,

    Anze

  • Pingback: Unofficial DreamHost Blog

  • Rob

    Order deny,allow
    Deny from all

    Did not work for me but this did:


    #
    # The following line prevents Subversion .svn files from being viewed by
    # Web clients.
    #
    RedirectMatch 404 /.svn(/|$)

  • karthik

    Hi,
    i want to know how to create a SSL connection through proxy server.i am using tortoise SVN client to checkout sourcesbut i am facing the following error

    Error: PROPFIND request failed on ‘/sdas’
    Error: PROPFIND of ‘/dsds’: Could not create SSL connection through proxy server

  • Slava

    Hi!
    I have same troubles with TortoiseSVN!!
    Do not create connection through proxy (WinGate)
    Help please! how resolve it?

  • Slava

    I resolve it!
    1. At the proxy (WinGate) create the additional WWW Proxy service, which go to inet directly, not through proxy of my provider.
    2. At the TorotiseSVN -> Settings -> Network
    set proxy to yourown for a sample ( 192.168.0.1 adress local proxy, port 733 you opened for it at the WWW proxy service)
    Regards

  • Tom

    In my opinion, trunk is for active development – it’s definately not a production branch. IMO development is an ongoing and iterative process:

    1) Branch trunk for mini project
    2) Code new feature/bug fix on devel branch, periodically merging changes from trunk onto the branch.
    3) Merge back changes from devel branch to trunk
    4) Branch trunk for next mini project

    At this point, trunk has your latest change on. When it has been acceptance tested, merge specific changes from trunk to release.

    YMMV, but I cannot think of a worse system than one automatically putting trunk live..

  • Pingback: Kevø X Thomson » Blog Archive » links for 2006-06-07

  • me, duh

    ok, so lets say I have a project called. ‘projectA’, and a user ‘me’
    would I check out the project to ‘/var/www/dev/me/projectA/’ ? is there anything wrong with this setup?

    the ‘dev/’ folder would hold all user’s ‘working copies’, and most likely an .htaccess file to require valid users.

    how would one modify/work on that ‘working copy’? I mean, you can’t really load it into say, subclipse running on a different machine… can you?

    something doesn’t click right in my head about this setup, i think i missed something but i can’t figure out what. I’ve been reading as much as i can from that ‘Version Control with Subversion’ book to no avail.

    note: new linux user here, its been 4 days since i started using linux, 2 days for subversion

  • mocha

    me, duh

    I use sftpdrive to map my linux server to a network drive (via ssh).

    After much thought I’m trying for the same setup you propose.

    Unfortunately tortoisesvn does not work in conjunction with the mapped drive. But it does work fine via samba when I am in the office.

  • Kostya

    Hi,
    I have configured the post-commit:

    svn up $FROM
    rsync -p -g -avr –delete –cvs-exclude $FROM $TO

    It is works well for the root user, but I have a problem when I use other users for commit.
    How can I use svn and rsync with two different users: daemon – apache user and my $FROM directory has owned by “kostya” user. I put these users to the same groups (daemon and kostya), and changed mod 775 for $FROM directories but I still have errors:
    rsync: failed to set permissions on “$FROM/somefile.php”: Operation not permitted (1)
    rsync: failed to set permissions on “$FROM/somefile2.php”: Operation not permitted (1)

    Could somebody help me?

  • anonymous

    What branching patterns work best for web development?

  • Jay

    Hi, thanks for the great article! The web site I’m currently administerting has about 17G of data… some of this content is media which can be moved I guess… but still, how do you recommend maintaining a large working space for users?

  • jdelisabeth

    Hello, this article is very interesting !!!

    But i don’t understand this one :

    A testing webserver with at least one webspace (working copy) for each developer

    How do you create webspace (working copy)?

    How the developer acces to the working copy ?

    Thx.

  • earpick

    To mount the remote shares as local ones you can always use SftpDrive for Windows machines and MacFUSE+SSHFS for Mac. No need to FTP over!

  • Bart

    GIT!!

  • Robert Mońka

    There is small problem in that kind of working. If you have to correct an error, which you can sea only with user data (ex. photos) uploaded to the main server.

    On working copy you can not see it. It is common problem to me.
    So I have to manually copy actual database + uploaded files to test server (not to svn).

    Do you have any idea to resolve this in simple / automatic way?


    Robert Mońka