SitePoint Sponsor

User Tag List

Results 1 to 13 of 13
  1. #1
    SitePoint Member
    Join Date
    Feb 2013
    Posts
    10
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Question Another url rewrite question

    I just cant seem to learn regex - its not close enough to my main work and I seldom get back to this stuff regularly enough. Yet I do have to try and fix some issues from a menu change giving a lot of errors.
    I feel a 301 redirect in htaccess is the right thing to do.

    I am trying to get rid of old urls that finish with .html now site.com/page1/article304.html needs to become site.com/page1/article304

    I have looked around but not found anything similar enough so I was wondering if I could get a direct solution here. Thanks for those with the knowledge who like to share.

  2. #2
    SitePoint Enthusiast tentonjim's Avatar
    Join Date
    Aug 2003
    Location
    Seattle, WA USA
    Posts
    84
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Scroll down near the bottom here this may help
    http://www.seomoz.org/learn-seo/redirection
    Jim Summer
    ~ Twitter SEO_Web_Design or Google+

  3. #3
    SitePoint Member
    Join Date
    Feb 2013
    Posts
    10
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Not a bad starting point - some usefulr rmembering things here

    RedirectMatch 301 /(.*)\.(html) http://www.site.org/$1

    doesnt quite have the regex I need though - this is close but it doesnt redirect properly and is a very dangerous line ! breaks all the site.

  4. #4
    Certified Ethical Hacker silver trophybronze trophy dklynn's Avatar
    Join Date
    Feb 2002
    Location
    Auckland
    Posts
    14,672
    Mentioned
    19 Post(s)
    Tagged
    3 Thread(s)
    ARGH! Another BAD referral as the EVERYTHING atom is the bane of all mod_rewrite newbies!

    landed, you might benefit from reading the mod_rewrite tutorial linked in my signature as it contains explanations and sample code. It's helped may members and should help you, too.

    Regards,

    DK
    David K. Lynn - Data Koncepts is a long-time WebHostingBuzz (US/UK)
    Client and (unpaid) WHB Ambassador
    mod_rewrite Tutorial Article (setup, config, test & write
    mod_rewrite regex w/sample code) and Code Generator

  5. #5
    SitePoint Member
    Join Date
    Feb 2013
    Posts
    10
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule ^(.*)\.html$ /$1 [L,R=301]

    the above worked. I am not sure how you can do this without the global (.*) in this case but I did take your point and its dangers.

    I have read already your tutorials but this subject is too big or my grey matter too small for dipping in and out as a web producer. These subjects need specialists sure. Thanks for posting everyone.

  6. #6
    Certified Ethical Hacker silver trophybronze trophy dklynn's Avatar
    Join Date
    Feb 2002
    Location
    Auckland
    Posts
    14,672
    Mentioned
    19 Post(s)
    Tagged
    3 Thread(s)
    landed,

    I would advise learning some regex as it can be critical:

    page1/article304.html => page1/article304

    Code:
    RewriteEngine on
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule ^([a-z0-9/]+)\.html$ $1 [R=301,L]
    That will do it very nicely and avoid any problems with dot (and other misc) characters.

    Regards,

    DK
    David K. Lynn - Data Koncepts is a long-time WebHostingBuzz (US/UK)
    Client and (unpaid) WHB Ambassador
    mod_rewrite Tutorial Article (setup, config, test & write
    mod_rewrite regex w/sample code) and Code Generator

  7. #7
    SitePoint Member
    Join Date
    Feb 2013
    Posts
    10
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thank you for the improved lines of code ! I am using them and am grateful as others who also come by will be I am sure.
    To continue my learning I wondered what happens if I have further rules as I understand that the L means dont process further

    so page.html changes to page then it stops doing further so what if I wanted to pick up on a later rule as well...

    so request url is initially

    http://site.com/honduras.html

    then becomes

    http://site.com/honduras

    but i want that to redirect to (or want to respect another rule)

    so redirect 301 /honduras http://site.com/central-america/honduras

    is a loop going to be possible, ie does the apache first strip the html then goes round and does the adding of central-america process...or is it not like a loop do we get ONE pass.

  8. #8
    SitePoint Member
    Join Date
    Feb 2013
    Posts
    10
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by dklynn View Post
    landed,

    page1/article304.html => page1/article304

    Code:
    RewriteEngine on
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule ^([a-z0-9/]+)\.html$ $1 [R=301,L]
    DK
    I got a full path in the url here so this didnt work for me - i.e. it was matching the file path instead of the url and so i get /public/g/ etc which is normally hidden in urls.

  9. #9
    Certified Ethical Hacker silver trophybronze trophy dklynn's Avatar
    Join Date
    Feb 2002
    Location
    Auckland
    Posts
    14,672
    Mentioned
    19 Post(s)
    Tagged
    3 Thread(s)
    landed,

    No problem. I really loathe the inappropriate use of (.*) so my signature's tutorial does let you know how to get around its pitfalls.

    No, the Last flag tells mod_rewrite to restart from the beginning with the new {REQUEST_URI}. Otherwise, it'll go to the end and, because it had a match/redirection, it will start over from the end. You're merely saving a few microseconds of processing time (where speed is essential).

    OMG! A new requirement in the middle of a thread? Oh, well, if you have a list of CA countries to which you need to add the CA subdirectory, then you'll need to provide the list, match the country and redirect. If this is what you REALLY want, please provide the list you're using in your database and an attempt to accomplish my "pseudo code" and I'll be back later to help. Don't forget that this new rule is more specific than the "general rule" so it has to precede it.

    mod_rewrite is only a one pass proposition is there are no matches.

    You might benefit from reading the mod_rewrite tutorial linked in my signature as it contains explanations and sample code. It's helped may members and should help you, too.

    Regards,

    DK
    David K. Lynn - Data Koncepts is a long-time WebHostingBuzz (US/UK)
    Client and (unpaid) WHB Ambassador
    mod_rewrite Tutorial Article (setup, config, test & write
    mod_rewrite regex w/sample code) and Code Generator

  10. #10
    SitePoint Wizard bronze trophy Jeff Mott's Avatar
    Join Date
    Jul 2009
    Posts
    1,313
    Mentioned
    19 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by dklynn View Post
    landed,

    I would advise learning some regex as it can be critical:

    page1/article304.html => page1/article304

    Code:
    RewriteEngine on
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule ^([a-z0-9/]+)\.html$ $1 [R=301,L]
    That will do it very nicely and avoid any problems with dot (and other misc) characters.

    Regards,

    DK
    Unfortunately, with that rewrite rule, page1/article-304.html would not rewrite to page1/article-304. Your regexp doesn't match dashes, nor does it match many other valid URL characters. Certainly there are situations where matching with dot isn't appropriate, but this is a situation where matching with dot is absolutely appropriate. The OP wants to rewrite all URLs ending in .html, so it makes perfect sense to match on all characters.

    # Any RewriteConds here

    # Match any URL (dot means any) ending in .html
    RewriteRule ^(.+)\.html$ $1 [R=301,L]
    "First make it work. Then make it better."

  11. #11
    Certified Ethical Hacker silver trophybronze trophy dklynn's Avatar
    Join Date
    Feb 2002
    Location
    Auckland
    Posts
    14,672
    Mentioned
    19 Post(s)
    Tagged
    3 Thread(s)
    Jeff,

    Correct! The OP didn't specify dashes so why bother adding those (to expose his script to more than it can handle)?

    Ditto "many other valid URL characters." You certainly don't want to match : as it's not a valid URI character; ? as it's not a valid URI character; etc.

    The value of this is that mod_rewrite can do some limited error checking for you so your scripts don't have to (albeit, it would be a good idea for them to validate the input before accessing the database).

    Finally, yes, the dot character is specified following the character range definition and it's followed by html and the end anchor. What's your point? Do you want to allow multiple dots to be matched?

    Specificity makes a difference in mod_rewrite. The tighter you can specify your requirements the easier it is to write good mod_rewrite code.

    Okay, you do get a point for using the + metacharacter rather than the * I see all too often.

    You might benefit from reading the mod_rewrite tutorial linked in my signature as it contains explanations and sample code. It's helped may members and should help you, too.

    Regards,

    DK
    David K. Lynn - Data Koncepts is a long-time WebHostingBuzz (US/UK)
    Client and (unpaid) WHB Ambassador
    mod_rewrite Tutorial Article (setup, config, test & write
    mod_rewrite regex w/sample code) and Code Generator

  12. #12
    SitePoint Wizard bronze trophy Jeff Mott's Avatar
    Join Date
    Jul 2009
    Posts
    1,313
    Mentioned
    19 Post(s)
    Tagged
    1 Thread(s)
    I disagree with your approach. I think your regexp becomes more complicated than it needs to be due to possibly long lists of characters in the class. I think you run the risk of introducing bugs by forgetting certain characters. And to boot, I think you get little to no benefit for it.

    The last point I'll leave you with is that the Apache documentation (which, in my opinion, is more authoritative than your personal tutorial) uses dot in situations exactly like this one.

    # example 1: file extension change
    RewriteRule ^(.+)\.html$ $1.php

    # example 2: parse out basename
    RewriteRule ^(.+)\.html$ $1
    "First make it work. Then make it better."

  13. #13
    SitePoint Member
    Join Date
    Feb 2013
    Posts
    10
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks for your help guys I would like to say that I find the apache docs hard to read (a matter of the little grey matter being too little). I have got further this time and maybe little by little it is sinking in. I dont think you can oversimplify any tutorials and any new ones are always a welcome read by me.

    Interesting that none seem to cover how the url life may loop through the .htaccess file from the above I see that the url indeed will pass through until no further matches happen the L makes us go back to the start with the current changed url now. And a strategy for handling specificity. DK was saying that to do more base (towards the left hand side of the string) changes first then to get to the original OP question the .html removal which could therefore be the very last match we want to do and similarly removal of www or adding www as people want (more cosmetic)

    the .html or .php removal is useful as it means the url might have a better chance of living longer as its less specific in seo terms.


Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •