As our web applications grew and more and more developers started working on them, it became obvious that we needed some kind revision control system to manage our code. As CVS is quite dated and Subversion (SVN) introduced some handy features (atomic transactions, Apache piggybacking, more convenient branching/tagging, tons of other improvements), we chose to go with SVN. The big question was: how to use it correctly? After coming up with some more or less weird ideas I think we finally found a decent solution to put a web application into source control. Keep in mind that this post is not about how to use SVN itself, it’s rather about the system architecture for using it in web development. If you need help with Subversion, you can find it in the manual and the FAQ.
Managing web applications in SVN is tricky for some reasons:
- When using revision control, a programmer is always working on a ‘working copy’ of the project. In traditional software engineering, this copy is somewhere on the his machine, as it’s a stand-alone application. In web development, however, we’re talking about webspace. Should every developer have a PHP environment on his machine then? Shouldn’t all programmers work on the exact same server configuration? What about Windows and Mac users?
- This means the working copies should be best on one webserver, along with an SVN client. Thus, we need some interface (the most simple one being SSH) to access it. Maybe a rich web client would be even better.
- Most web applications use a database. Different versions of software need different versions of database designs. Even worse, their operabililty may depend on data in the database. An application version without a matching database structure is no good.
- The application needs to be deployed to a live webspace. This might happen quite often and should be as painless as possible.
So this is what we’re going to need to make it work:
- A public webserver with the live webspace
- A testing webserver with at least one webspace (working copy) for each developer
- A MySQL server with two databases
- A SVN server
- Some discipline
Each developer is now able to work on the project in his own working directory (webspace). They all share one development database though, as the code shouldn’t be dependant on data in the database. Whenever a programmer has finished his task, he commits his changes to the repository.
When the next version of your application is ready, it can be rolled out to the live server, which is a working copy as well. That way, updates can be released very quickly and efficiently without having to export the whole repository. All we need is a little script or webmin to trigger the rollout. For security reasons, there is no other way to access files on the live webspace. All that needs to be done yet is to prevent Apache from descending into .svn folders. A little directive in our httpd.conf can help us here.
# Disallow browsing of Subversion working copy administrative dirs. <directorymatch "^/.*/.svn/"> Order deny,allow Deny from all </directorymatch>
For each version, there is a matching database structure. How are we going to handle this? First of all it seems smart to keep things backwards compatible, as it’s important that you can roll back to a previous version without breaking your application. Which means: don’t delete anything anywhere, just extend existing structures.
On each software release, all changes to the database structure need to be made on the live database server. On first thought this should be done automagically. The problem is: how? We don’t just want to “replicate” all the development server’s differences to the live server because there are certainly test tables or other half-baked things around there. Which makes it necessary to carefully select and that’s painful.
That’s why you should keep a changelog, where you can append all changes to the database which are meant for live use. It takes some discipline but has the advantage that you always have a ready SQL file to build your database from. If properly commented, you can use it as a changelog, too.
What if, for some odd reason, your code is dependant on data in the database? Say, the there’s some sort of sitemap stored in the database. This is painful because you have to update that table manually. Before each commit, when there’s been changes, you’ll have to dump it to an SQL file to put it under revision control. This way your data becomes mergeable, too. When there’s a lot of places to do so, you should consider scripts for freezing (writing database data into files) and unfreezing (vice versa) your web application.
What to put in…
Of course, your code belongs into version control. So do database schemata, templates and graphics. What you shouldn’t put in, is:
- User data – user uploaded files and stuff don’t belong there.
- Database data – don’t confuse revision control with a backup!
- Cached data – it’s temporary data.
- Configuration files – they’re to be created manually.
Don’t even think about putting configuration files into revision control. First of all, they tend to get overwritten and, for example, change your live server’s database. Second, passwords don’t belong into revision control! Instead, you should add default config files with changed filenames, like database.conf.default.
Branching and tagging
It’s important to remember that only working code belongs in the trunk. After all, it needs to be ready for deployment all the time. Because of this you need to create a new branch, whenever:
- You’re working on a big project that requires its own versioning.
- It’s a project that takes a rather long time to complete. Your code is safer in the repository than on a developer’s laptop.
- When more than one person is working on a project. In their own branch, developers can share their code without damaging the trunk.
These branches are merged back into the trunk after completion. About tagging, the only thing I can say is that it’s smart to create a tag whenever you’re updating your live webserver to keep track of your releases.
A nice side-effect of this whole approach is that no developer needs access to the live server anymore. When done right, live server updates are triggered over a webmin interface and that’s about all administrative access you need on that machine. Which is good. And don’t forget to use SSL for your SVN transactions to be safe!