Friendly URLs
So I hope everyone in the US has survived the Daylight Savings Time crisis of 2007! I don’t know about you but i didn’t even feel it :)
I was browsing the forums recently, as i typically do when I’m looking for a topic to blog on, and I came across a post by forumposters entitled “Clean and descriptive url’s“. In this post forumposters asks:
“What have you fellow CF developers done to make your URLs look better? I’d like to see many examples and options if you would all be so kind to share”
I thought this was a good topic for me since I have a good bit of experience both historically and recently with this very issue.
For the longest time search engines would treat URLS with query strings aka dynamic URLs, everything after the question mark (?) in the URL, differently. Mostly pages which had these query strings would be ranked lower than a page which didn’t. So if you had the URL:
http://www.example.com/books/index.cfm?category=coldfusion&author=forta
it would rank lower in search results versus a URL formatted like so:
http://www.example.com/books/coldfusion/forta/
So it’s been a pretty big tasks for developers to try and get their URLs to be “clean”, meaning they wanted to remove the question marks (?), ampersands (&), and equal signs (=) from the mixture. This would result in what is commonly known as a Search Engine Friendly URL and allow your site to achieve better rankings. A quick tangent here; Google has said that it will better index dynamic URLs so the issue of search engine friendly URLs is fading but now it’s turning more into a user friendly URL issue. Meaning developers, customers and users want URLs others users can understand, remember and share easier then those longer dynamic URLs. All in all the idea of a friendly URL is much more accepted.
So what are your options when it comes to dealing with creating friendly URLs? There thankfully are a lot of options to choose from depending on your setup, time, and abilities. The most widely accepted method to deal with turning the dynamic URL to a friendly URL is to use a rewrite module. This rewrite module will do the heavy lifting of translating a friendly URL into a dynamic URL so your code can operate as expected. so using our example above a rewrite module would turn
http://www.example.com/books/coldfusion/forta/
into
http://www.example.com/books/index.cfm?category=coldfusion&author=forta
Our users would only see the friendly URL but the server and our code would see the dynamic URL and all the associated URL variables would be created for our use as well.
Depending on your web server you might already have a rewrite module installed and ready to use. If you are running ColdFusion on Apache then you have the popular
mod_rewrite at your disposal for those on IIS it’s a bit more work as IIS doesn’t support rewriting out of the box. Thankfully there are options out there though which make IIS just as cool as Apache for both a fee and for free. The fee option I typically recommend is IsapiRewrite by Helicon. It’s an ISAPI plug-in which acts very similar to mod_rewrite, in fact version 3 uses the same exact rules! They have a free version for 1 site or for $99 you can get a license for unlimited IIS sites. If you are on a budget there is also Ionic’s ISAPI Rewrite Filter which is totally free and pretty robust as well.
I’m not going to go into any real details about these products since they each have their own little ways about them but most rewrite modules use some form of Regular Expressions to translate a URL from one form to another. Most of the translations you’ll probably need to do can easily be written with just a basic knowledge of regular expressions.
But what if you don’t have Apache, and you cannot install a plug-in to IIS to get this great rewrite capabilities? Are you up the creek without a paddle? Left to suffer because of some corporate politics? Not at all! Before I go down this path let me just say that in my experience the rewrite modules are much more robust and will typically out perform anything we discuss here. However saying that it’s not uncommon to see a programmatic way for dealing with the friendly URL issue.
One option which isn’t widely accepted and typically frowned upon in the developer world is the 404 method. This is where you setup a 404 page for your site and using some coding practice (like switch case statements) you’d check to see if you had a match and then include the proper code to make it work. I’m not gong to go down this path because I don’t recommend it for a lot of reasons. The biggest reason is that it really messes up the statistics for your site since everything is reported as a 404 error in the logs. It’s my understanding that many search engines are removing 404 pages from their indexs, but hey who needs the search engines, yea right try explaining that to a customer!
The other option is to use what I call a gateway script. This gateway script allows you to run everything from a central place and using some fancy coding you can make those friendly URLs without getting a bunch of 404 errors. In fact I’m sure you’ve already seen this in practice in a few of the ColdFusion blogs you read! Most users who use the wonderful BlogCFC have had friendly URLs for a while using this method. In a BlogCFC application the URLs typically look like:
http://ray.camdenfamily.com/index.cfm/2007/4/3/Did-you-know-about-the-Log-Viewer-Filter
In the URL above Ray has the gateway script running in the root index.cfm in his site. Then using a bit of CF code he’s able to extract the “/2007/4/3/Did-you-know-about-the-Log-Viewer-Filter” and do stuff with it. The benefits of using the gateway script is that index.cfm actually exists on the server so your web server doesn’t return it as a 404 error AND it also records the full page as unique URL so you can still see what pages your users are visiting etc. So how did he do this? Let’s take a peak at his code:
/**
* Parses my SES format. Demands /YYYY/MMMM/TITLE or /YYYY/MMMM/DDDD/TITLE
* One line from MikeD
*
* @author Raymond Camden (ray@camdenfamily.com)
* @version 1, June 23, 2005
*/
function parseMySES() {
//line below from Mike D.
var urlVars=reReplaceNoCase(trim(cgi.path_info), '.+.cfm/? *', '');
var r = structNew();
var theLen = listLen(urlVars,"/");
if(len(urlVars) is 0 or urlvars is "/") return r;
//handles categories
if(theLen is 1) {
urlVars = replace(urlVars, "/","");
r.categoryName = urlVars;
return r;
}
r.year = listFirst(urlVars,"/");
if(theLen gte 2) r.month = listGetAt(urlVars,2,"/");
if(theLen gte 3) r.day = listGetAt(urlVars,3,"/");
if(theLen gte 4) r.title = listLast(urlVars, "/");
return r;
}
The first thing you’ll notice is that Ray’s blog and function require your URLs to be in a specific format. This is a common practice when using a gateway script, but with a little work and a bit more code you can make your script and URLs be more versatile.
The basics of the function above is to take in the CGI.PATH_INFO variable returned by ColdFusion and parse out everything after the “.cfm” and use the forward slash “/” as the delimiter. The CGI.PATH_INFO variable returns extra path information after a script name. So in our example above from Ray’s Blog this would be “/2007/4/3/Did-you-know-about-the-Log-Viewer-Filter” which is everything after the index.cfm. Now in versions prior to ColdFusion 7 the CGI.PATH_INFO variable would actually return the script name AND extra path information so it wold look something like so “index.cfm/2007/4/3/Did-you-know-about-the-Log-Viewer-Filter”. Ray handles this by stripping out the .cfm and everything in front of it before parsing the string as seen by this line of code:
var urlVars=reReplaceNoCase(trim(cgi.path_info), '.+.cfm/? *', '');
Once he has a “clean” set of vars he can then treat it as a list using the forward slash “/” as the delimiter. You can see from the rest of his script above he just does a simple list length check to determine which variables can be setup and used.
Using the CGI.PATH_INFO variable is a great way to create friendly looking URLs that any search engine and most people will enjoy. Now there are other “things” you can do to help remove the .cfm file extension but these typically require a bit more work from the web server side of the house and aren’t worth the extra effort. If you are in need of that “look” I’d recommend going with one of the modules mentioned above.
So there you have it, a few options to get you started down the path of friendlier URLs.