Friendly URLs

So I hope everyone in the US has survived the Daylight Savings Time crisis of 2007! I don’t know about you but i didn’t even feel it :)

I was browsing the forums recently, as i typically do when I’m looking for a topic to blog on, and I came across a post by forumposters entitled “Clean and descriptive url’s“. In this post forumposters asks:

“What have you fellow CF developers done to make your URLs look better? I’d like to see many examples and options if you would all be so kind to share”

I thought this was a good topic for me since I have a good bit of experience both historically and recently with this very issue.

For the longest time search engines would treat URLS with query strings aka dynamic URLs, everything after the question mark (?) in the URL, differently. Mostly pages which had these query strings would be ranked lower than a page which didn’t. So if you had the URL:

http://www.example.com/books/index.cfm?category=coldfusion&author=forta

it would rank lower in search results versus a URL formatted like so:

http://www.example.com/books/coldfusion/forta/

So it’s been a pretty big tasks for developers to try and get their URLs to be “clean”, meaning they wanted to remove the question marks (?), ampersands (&), and equal signs (=) from the mixture. This would result in what is commonly known as a Search Engine Friendly URL and allow your site to achieve better rankings. A quick tangent here; Google has said that it will better index dynamic URLs so the issue of search engine friendly URLs is fading but now it’s turning more into a user friendly URL issue. Meaning developers, customers and users want URLs others users can understand, remember and share easier then those longer dynamic URLs. All in all the idea of a friendly URL is much more accepted.

So what are your options when it comes to dealing with creating friendly URLs? There thankfully are a lot of options to choose from depending on your setup, time, and abilities. The most widely accepted method to deal with turning the dynamic URL to a friendly URL is to use a rewrite module. This rewrite module will do the heavy lifting of translating a friendly URL into a dynamic URL so your code can operate as expected. so using our example above a rewrite module would turn

http://www.example.com/books/coldfusion/forta/

into

http://www.example.com/books/index.cfm?category=coldfusion&author=forta

Our users would only see the friendly URL but the server and our code would see the dynamic URL and all the associated URL variables would be created for our use as well.

Depending on your web server you might already have a rewrite module installed and ready to use. If you are running ColdFusion on Apache then you have the popular
mod_rewrite at your disposal for those on IIS it’s a bit more work as IIS doesn’t support rewriting out of the box. Thankfully there are options out there though which make IIS just as cool as Apache for both a fee and for free. The fee option I typically recommend is IsapiRewrite by Helicon. It’s an ISAPI plug-in which acts very similar to mod_rewrite, in fact version 3 uses the same exact rules! They have a free version for 1 site or for $99 you can get a license for unlimited IIS sites. If you are on a budget there is also Ionic’s ISAPI Rewrite Filter which is totally free and pretty robust as well.

I’m not going to go into any real details about these products since they each have their own little ways about them but most rewrite modules use some form of Regular Expressions to translate a URL from one form to another. Most of the translations you’ll probably need to do can easily be written with just a basic knowledge of regular expressions.

But what if you don’t have Apache, and you cannot install a plug-in to IIS to get this great rewrite capabilities? Are you up the creek without a paddle? Left to suffer because of some corporate politics? Not at all! Before I go down this path let me just say that in my experience the rewrite modules are much more robust and will typically out perform anything we discuss here. However saying that it’s not uncommon to see a programmatic way for dealing with the friendly URL issue.

One option which isn’t widely accepted and typically frowned upon in the developer world is the 404 method. This is where you setup a 404 page for your site and using some coding practice (like switch case statements) you’d check to see if you had a match and then include the proper code to make it work. I’m not gong to go down this path because I don’t recommend it for a lot of reasons. The biggest reason is that it really messes up the statistics for your site since everything is reported as a 404 error in the logs. It’s my understanding that many search engines are removing 404 pages from their indexs, but hey who needs the search engines, yea right try explaining that to a customer!

The other option is to use what I call a gateway script. This gateway script allows you to run everything from a central place and using some fancy coding you can make those friendly URLs without getting a bunch of 404 errors. In fact I’m sure you’ve already seen this in practice in a few of the ColdFusion blogs you read! Most users who use the wonderful BlogCFC have had friendly URLs for a while using this method. In a BlogCFC application the URLs typically look like:

http://ray.camdenfamily.com/index.cfm/2007/4/3/Did-you-know-about-the-Log-Viewer-Filter

In the URL above Ray has the gateway script running in the root index.cfm in his site. Then using a bit of CF code he’s able to extract the “/2007/4/3/Did-you-know-about-the-Log-Viewer-Filter” and do stuff with it. The benefits of using the gateway script is that index.cfm actually exists on the server so your web server doesn’t return it as a 404 error AND it also records the full page as unique URL so you can still see what pages your users are visiting etc. So how did he do this? Let’s take a peak at his code:



/**
 * Parses my SES format. Demands /YYYY/MMMM/TITLE or /YYYY/MMMM/DDDD/TITLE
 * One line from MikeD
 *
 * @author Raymond Camden (ray@camdenfamily.com)
 * @version 1, June 23, 2005
 */ 
function parseMySES() {
	//line below from Mike D.
	var urlVars=reReplaceNoCase(trim(cgi.path_info), '.+.cfm/? *', '');
	var r = structNew();
	var theLen = listLen(urlVars,"/");

	if(len(urlVars) is 0 or urlvars is "/") return r;
	
	//handles categories
	if(theLen is 1) {
			urlVars = replace(urlVars, "/","");
			r.categoryName = urlVars;	
			return r;
	}

	r.year = listFirst(urlVars,"/");
	if(theLen gte 2) r.month = listGetAt(urlVars,2,"/");
	if(theLen gte 3) r.day = listGetAt(urlVars,3,"/");
	if(theLen gte 4) r.title = listLast(urlVars, "/");
	return r;
}

The first thing you’ll notice is that Ray’s blog and function require your URLs to be in a specific format. This is a common practice when using a gateway script, but with a little work and a bit more code you can make your script and URLs be more versatile.

The basics of the function above is to take in the CGI.PATH_INFO variable returned by ColdFusion and parse out everything after the “.cfm” and use the forward slash “/” as the delimiter. The  CGI.PATH_INFO variable returns extra path information after a script name. So in our example above from Ray’s Blog this would be “/2007/4/3/Did-you-know-about-the-Log-Viewer-Filter” which is everything after the index.cfm. Now in versions prior to ColdFusion 7 the CGI.PATH_INFO variable would actually return the script name AND extra path information so it wold look something like so “index.cfm/2007/4/3/Did-you-know-about-the-Log-Viewer-Filter”. Ray handles this by stripping out the .cfm and everything in front of it before parsing the string as seen by this line of code:

var urlVars=reReplaceNoCase(trim(cgi.path_info), '.+.cfm/? *', '');

Once he has a “clean” set of vars he can then treat it as a list using the forward slash “/” as the delimiter. You can see from the rest of his script above he just does a simple list length check to determine which variables can be setup and used.

Using the CGI.PATH_INFO variable is a great way to create friendly looking URLs that any search engine and most people will enjoy. Now there are other “things” you can do to help remove the .cfm file extension but these typically require a bit more work from the web server side of the house and aren’t worth the extra effort. If you are in need of that “look” I’d recommend going with one of the modules mentioned above.

So there you have it, a few options to get you started down the path of friendlier URLs.

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • Michael Dinowitz

    Ionic is great but it checks EVERY requested file, even images. Not what I want. I use it with a client using some tight regex, but that’s only because he’s still on ColdFusion 7.

    I use this technique for House of Fusion at the moment
    http://www.fusionauthority.com/techniques/4226-search-engine-safe-urls.htm
    but will be moving over to the technique on Fusion Authority very soon. What is the FA technique? onMissingTemplate()

    The above FA link does not exist as a physical file. When its requested, the onMissingTemplate() handler will look at the url and return content as if the file existed. Very smooth, very cool and very search engine friendly.

    While the technique is described above, I have an article in the next FAQU that goes into deep detail on it.

    As an aside, I mapped the .htm extension into CF using the standard technique of editing web.xml and the webservers mappings.

  • http://www.webexc.com/ BrookeA

    Good info – thanks!

  • Richard Morton

    Simple URLs have to be better, especially in the slightly obscure case of cutting and pasting links, reducing the possibility of errors creeping in.

    Richard Morton
    www.qm-consulting.co.uk

  • bincom

    I use joomla very often and I learnt very painfully to always use Search engine friendly links. Apart from the lower ranks, most of the inner pages were not archived.

    bincom

  • Rob

    I recommend IIS Mod-Rewrite pro which is superior to all other similar rewriter modules for IIS.

  • James Allen

    Great article, although you are slightly mistaken concerning the 404 error handler method.

    Rather than every non-existent URL being logged with a 404 error code, it is quite the opposite – every non-existent URL is logged with a successful 200 code (as the web server is directed to a real page on the server. E.G 404page.cfm). It is therefore important to ensure a 404 code is issued if the URL is NOT valid on the site to avoid a search engine indexing your ‘page not found’ text and storing an invalid page in it’s index.

    On a site where I utlised this method I simply set the status code when a page was requested that wasn’t valid:

    I do think there is value in the 404 method. The only problem I found was that you can’t POST back to the SEF URL unless you can also set the 503 handler to the 404 error handler template. On the site I used – on shared hosting – I was unable to configure this so had to find an alternative way to handle form submissions.

  • James Allen

    In the above, the code sample for setting the statuscode is:

    <cfheader statuscode="404" statustext="Not Found" />

    I forgot to escape the html in the above post.. Doh.. ;)

  • http://solasproductions.com/ mcsolas

    If you are running ColdFusion on Apache then you have the popular mod_rewrite at your disposal for those on IIS it’s a bit more work as IIS doesn’t support rewriting out of the box.

    I am now and very happy to have this tool at my disposal. This exact reason is what made me migrate away from IIS.