SitePoint Sponsor

User Tag List

Results 1 to 6 of 6
  1. #1
    SitePoint Addict
    Join Date
    Jan 2012
    Posts
    261
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Switching to Lower Case

    My URL's look like this: mysite/People/George_A_Custer

    I want to change everything to lower case and replace the underscores with dashes, like this:

    mysite/people/george-a-custer

    I'm also thinking of modifying my scripts so that URL's with spaces or %20 instead of dashes will default to the URL's stored in my database (e.g. george-a-custer). Of course, I want visitors to be forwarded from old URL's to my new URL's. Also, I'd like to avoid confusing statistics that show the number of hits for endless variations of the same URL.

    From this SitePoint thread:

    "In terms of making sure that your hits track each page correctly in aggregate, the best way to do that is to set redirect rules in .htaccess to force the URL to lower case. Then even if your filenames are not case sensitive, you will get all the hits counted against the lower case format."

    So I'd like to ask if anyone can tell me how to modify the following .htaccess file if I want to go with lower case URL's. I assume I need to change Community, Genres and Music to community, genres and music. But what about [a-zA-Z0-9()_/-] ? Would I change it to [a-z0-9()_/-] ?

    Is there anything else I need to do to "force the URL to lower case"? Thanks!

    Code:
    RewriteEngine On
    RewriteRule ^test\.htm$ test.php [L]
    Options -MultiViews
    
    # php_value magic_quotes_gpc 0
    php_flag magic_quotes_gpc Off
    
    RewriteRule ^Topics/([a-zA-Z0-9()_/-]+)/?$ Topics/index.php?topic=$1 [L]
    RewriteRule ^World/([a-zA-Z0-9()_/-]+)/?$ World/index.php?area=$1 [L]
    RewriteRule ^Community/([a-zA-Z0-9()_/-]+)/?$ Community/index.php?community=$1 [L]

  2. #2
    Certified Ethical Hacker silver trophybronze trophy dklynn's Avatar
    Join Date
    Feb 2002
    Location
    Auckland
    Posts
    14,653
    Mentioned
    19 Post(s)
    Tagged
    3 Thread(s)
    Chavista,

    IF you have access to the server or virtual host configuration files, you could use mod_rewrite's RewriteMap and use tolower() to change all uppercase to lowercase letters. Unfortunately, given that kind of access can easily bring the server down so they are reserved to the world of sys admins (or dedi server users).

    To simulate the tolower() function, you could do what you're already doing: Using index.php as a "handler" which can more easily use the strtolower() PHP function. It appears that you must already be doing this.

    Unfortunately, looking at your regex, I saw three problems: First, that the parentheses may be reserved characters in a URI (they're not ... but I had to check!) but they ARE in a regular expression as you've created an empty atom (best to escape them if you're actually trying to match parentheses) and second, that the /- at the end of the character range definition may be your attempt to escape the - character ... it should be first but works quite well as the last character without escape) and, finally, that the /? will allow requests to be from two directory levels (which will destroy your relative links - after all, which level are they relative to?).

    If there is a problem with using your code (other than with the relative links), please show your test URIs.

    Regards,

    DK
    David K. Lynn - Data Koncepts is a long-time WebHostingBuzz (US/UK)
    Client and (unpaid) WHB Ambassador
    mod_rewrite Tutorial Article (setup, config, test & write
    mod_rewrite regex w/sample code) and Code Generator

  3. #3
    SitePoint Addict
    Join Date
    Jan 2012
    Posts
    261
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks for all the tips! I'm a little confused, though.

    You wrote, "Using index.php as a 'handler' which can more easily use the strtolower() PHP function. It appears that you must already be doing this."

    I'm not sure what you mean. However, I am aware of the strtolower function and can use it on any of my includes (not just index.php). However, I suspect you're referring to the index pages of my various sections (e.g. MySite/World.php, MySite/Topics.php, etc.). If someone types MySite/World/New_York into their browser, and I have a script in the static page at MySite/World/index.php that changes it to lower case and replaces the underscore with a dash, then that would actually be a very easy solution. In fact, it would actually be very easy to do, as all my section index pages include a common file where I could put that file. (Now I'm thinking, duh...why didn't I think of that before?)

    Do you happen to know how that would affect my statistics? If someone types in New_York and it defaults to new-york, would my stats list one hit for each (which I don't want) or just one hit for new-york? However, it there are problems with stats, I can always work on that later. Right now, I'm just trying to get the basics.

    You also wrote...

    1) "The parentheses may be reserved characters in a URI (they're not ... but I had to check!) but they ARE in a regular expression as you've created an empty atom (best to escape them if you're actually trying to match parentheses)."

    Sorry, I don't understand what you're saying there. Very few of my URL's will actually contain parentheses, unless I something like this:

    world/georgia-(state) vs world/georgia-(republic)

    2) "The /- at the end of the character range definition may be your attempt to escape the - character ... it should be first."

    So I should change this...

    Code:
    RewriteRule ^Topics/([a-zA-Z0-9()_/-]+)/?$ Topics/index.php?topic=$1 [L]
    ...to this?

    Code:
    RewriteRule ^Topics/([/a-zA-Z0-9()_-]+)/?$ Topics/index.php?topic=$1 [L]
    3. The /? will allow requests to be from two directory levels (which will destroy your relative links - after all, which level are they relative to?)."

    So I should further delete the first question mark, changing it to this?

    Code:
    RewriteRule ^Topics/([/a-zA-Z0-9()_-]+)/$ Topics/index.php?topic=$1 [L]
    I'm going to try your first tip right now. That sounds like a really easy fix...

  4. #4
    SitePoint Addict
    Join Date
    Jan 2012
    Posts
    261
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Actually, I haven't been able to change the URL through any of my index pages. I can change the VALUE of the URL, where $MyURL = the URL in my browser window - $MyURL = strtolower($MyURL). But I can't change the URL in the browser window. I think that's something that can only be done in an .htaccess file, right?

  5. #5
    Certified Ethical Hacker silver trophybronze trophy dklynn's Avatar
    Join Date
    Feb 2002
    Location
    Auckland
    Posts
    14,653
    Mentioned
    19 Post(s)
    Tagged
    3 Thread(s)
    Chavista,

    Quote Originally Posted by Chavista View Post
    Actually, I haven't been able to change the URL through any of my index pages. I can change the VALUE of the URL, where $MyURL = the URL in my browser window - $MyURL = strtolower($MyURL). But I can't change the URL in the browser window. I think that's something that can only be done in an .htaccess file, right?
    Correct (UNLESS you go to the bother of using a header("location:...") [and a header("status:...")] in your index.php script). IMHO, it's not worth the bother once you're in your index.php script.

    Quote Originally Posted by Chavista View Post
    Thanks for all the tips! I'm a little confused, though.

    You wrote, "Using index.php as a 'handler' which can more easily use the strtolower() PHP function. It appears that you must already be doing this."

    I guessed that, because you were capturing all the crud with ([a-zA-Z0-9()_/-]+), you were making the capitalization change in your index.php file(s). My bad (guess).

    I'm not sure what you mean. However, I am aware of the strtolower function and can use it on any of my includes (not just index.php). However, I suspect you're referring to the index pages of my various sections (e.g. MySite/World.php, MySite/Topics.php, etc.). If someone types MySite/World/New_York into their browser, and I have a script in the static page at MySite/World/index.php that changes it to lower case and replaces the underscore with a dash, then that would actually be a very easy solution. In fact, it would actually be very easy to do, as all my section index pages include a common file where I could put that file. (Now I'm thinking, duh...why didn't I think of that before?)

    Actually, I stopped to think (once) and nothing happened!

    Do you happen to know how that would affect my statistics? If someone types in New_York and it defaults to new-york, would my stats list one hit for each (which I don't want) or just one hit for new-york? However, it there are problems with stats, I can always work on that later. Right now, I'm just trying to get the basics.

    I believe that, when you use a single file as a handler (and provide different content), that WILL change your page rankings. Have you looked at my client's website at http://wilderness-wally.com? All you (and SE's) will see is the (modified for URI) title of each article (and I suspect you'll not be able to discover the name of the handler file). That's a slightly different problem than your tolower() redirection question, though.

    You also wrote...

    1) "The parentheses may be reserved characters in a URI (they're not ... but I had to check!) but they ARE in a regular expression as you've created an empty atom (best to escape them if you're actually trying to match parentheses)."

    Sorry, I don't understand what you're saying there. Very few of my URL's will actually contain parentheses, unless I something like this:

    world/georgia-(state) vs world/georgia-(republic)

    [indent]What I was saying was that I believe that parentheses are illegal in a URI. I then checked and discovered that they're not BUT they are metacharacters within regex so they must be escaped ( "\(" and "\)" without the quotes) when used ... unless within a character range definition (e.g., your ([a-zA-Z0-9()_/-]+) ). Color me wary of "unusual" characters in a URI then ignore my concerns about it as you've used them.[/incent]

    2) "The /- at the end of the character range definition may be your attempt to escape the - character ... it should be first."

    The / character is okay but the - character is a metacharacter within a character range definition. Apache says it should be the first character IF you need it to match -'s but it does also work as the last character. Presumably, it should also work when escaped (i.e., "\-" without the quotes.

    So I should change this...

    Code:
    RewriteRule ^Topics/([a-zA-Z0-9()_/-]+)/?$ Topics/index.php?topic=$1 [L]
    ...to this?

    Code:
    RewriteRule ^Topics/([/a-zA-Z0-9()_-]+)/?$ Topics/index.php?topic=$1 [L]
    As above, ignore my paranoia ... but I was suggesting RewriteRule ^Topics/([-a-zA-Z0-9()_/]+)/?$ Topics/index.php?topic=$1 [L]

    3. The /? will allow requests to be from two directory levels (which will destroy your relative links - after all, which level are they relative to?)."

    So I should further delete the first question mark, changing it to this?

    IMHO, trailing /'s should be reserved to denote a directory (as intended) so, no, I'd recommend deleting the /? entirely. The worst thing, though, is to make the directory level optional (file or directory) which makes relative links WRONG in one case and okay in the other. PPPPPPP (Proper Prior Planning Prevents Piss Poor Performance).

    Code:
    RewriteRule ^Topics/([/a-zA-Z0-9()_-]+)/$ Topics/index.php?topic=$1 [L]
    I'm going to try your first tip right now. That sounds like a really easy fix...
    Regards,

    DK
    David K. Lynn - Data Koncepts is a long-time WebHostingBuzz (US/UK)
    Client and (unpaid) WHB Ambassador
    mod_rewrite Tutorial Article (setup, config, test & write
    mod_rewrite regex w/sample code) and Code Generator

  6. #6
    SitePoint Addict
    Join Date
    Jan 2012
    Posts
    261
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Wow, great tips! I'll play around with them and see what I come up with. Thanks.


Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •