Switching to Lower Case

My URL’s look like this: mysite/People/George_A_Custer

I want to change everything to lower case and replace the underscores with dashes, like this:

mysite/people/george-a-custer

I’m also thinking of modifying my scripts so that URL’s with spaces or %20 instead of dashes will default to the URL’s stored in my database (e.g. george-a-custer). Of course, I want visitors to be forwarded from old URL’s to my new URL’s. Also, I’d like to avoid confusing statistics that show the number of hits for endless variations of the same URL.

From this SitePoint thread:

“In terms of making sure that your hits track each page correctly in aggregate, the best way to do that is to set redirect rules in .htaccess to force the URL to lower case. Then even if your filenames are not case sensitive, you will get all the hits counted against the lower case format.”

So I’d like to ask if anyone can tell me how to modify the following .htaccess file if I want to go with lower case URL’s. I assume I need to change Community, Genres and Music to community, genres and music. But what about [a-zA-Z0-9()/-] ? Would I change it to [a-z0-9()/-] ?

Is there anything else I need to do to “force the URL to lower case”? Thanks!


RewriteEngine On
RewriteRule ^test\\.htm$ test.php [L]
Options -MultiViews

# php_value magic_quotes_gpc 0
php_flag magic_quotes_gpc Off

RewriteRule ^Topics/([a-zA-Z0-9()_/-]+)/?$ Topics/index.php?topic=$1 [L]
RewriteRule ^World/([a-zA-Z0-9()_/-]+)/?$ World/index.php?area=$1 [L]
RewriteRule ^Community/([a-zA-Z0-9()_/-]+)/?$ Community/index.php?community=$1 [L]

Chavista,

IF you have access to the server or virtual host configuration files, you could use mod_rewrite’s RewriteMap and use tolower() to change all uppercase to lowercase letters. Unfortunately, given that kind of access can easily bring the server down so they are reserved to the world of sys admins (or dedi server users).

To simulate the tolower() function, you could do what you’re already doing: Using index.php as a “handler” which can more easily use the strtolower() PHP function. It appears that you must already be doing this.

Unfortunately, looking at your regex, I saw three problems: First, that the parentheses may be reserved characters in a URI (they’re not … but I had to check!) but they ARE in a regular expression as you’ve created an empty atom (best to escape them if you’re actually trying to match parentheses) and second, that the /- at the end of the character range definition may be your attempt to escape the - character … it should be first but works quite well as the last character without escape) and, finally, that the /? will allow requests to be from two directory levels (which will destroy your relative links - after all, which level are they relative to?).

If there is a problem with using your code (other than with the relative links), please show your test URIs.

Regards,

DK

Thanks for all the tips! I’m a little confused, though.

You wrote, “Using index.php as a ‘handler’ which can more easily use the strtolower() PHP function. It appears that you must already be doing this.”

I’m not sure what you mean. However, I am aware of the strtolower function and can use it on any of my includes (not just index.php). However, I suspect you’re referring to the index pages of my various sections (e.g. MySite/World.php, MySite/Topics.php, etc.). If someone types MySite/World/New_York into their browser, and I have a script in the static page at MySite/World/index.php that changes it to lower case and replaces the underscore with a dash, then that would actually be a very easy solution. In fact, it would actually be very easy to do, as all my section index pages include a common file where I could put that file. (Now I’m thinking, duh…why didn’t I think of that before?)

Do you happen to know how that would affect my statistics? If someone types in New_York and it defaults to new-york, would my stats list one hit for each (which I don’t want) or just one hit for new-york? However, it there are problems with stats, I can always work on that later. Right now, I’m just trying to get the basics.

You also wrote…

  1. “The parentheses may be reserved characters in a URI (they’re not … but I had to check!) but they ARE in a regular expression as you’ve created an empty atom (best to escape them if you’re actually trying to match parentheses).”

Sorry, I don’t understand what you’re saying there. Very few of my URL’s will actually contain parentheses, unless I something like this:

world/georgia-(state) vs world/georgia-(republic)

  1. “The /- at the end of the character range definition may be your attempt to escape the - character … it should be first.”

So I should change this…


RewriteRule ^Topics/([a-zA-Z0-9()_/-]+)/?$ Topics/index.php?topic=$1 [L]

…to this?


RewriteRule ^Topics/([/a-zA-Z0-9()_-]+)/?$ Topics/index.php?topic=$1 [L]

  1. The /? will allow requests to be from two directory levels (which will destroy your relative links - after all, which level are they relative to?)."

So I should further delete the first question mark, changing it to this?


RewriteRule ^Topics/([/a-zA-Z0-9()_-]+)/$ Topics/index.php?topic=$1 [L]

I’m going to try your first tip right now. That sounds like a really easy fix…

Actually, I haven’t been able to change the URL through any of my index pages. I can change the VALUE of the URL, where $MyURL = the URL in my browser window - $MyURL = strtolower($MyURL). But I can’t change the URL in the browser window. I think that’s something that can only be done in an .htaccess file, right?

Chavista,

Correct (UNLESS you go to the bother of using a header(“location:…”) [and a header(“status:…”)] in your index.php script). IMHO, it’s not worth the bother once you’re in your index.php script.

Regards,

DK

Wow, great tips! I’ll play around with them and see what I come up with. Thanks.