SitePoint Sponsor

User Tag List

Results 1 to 5 of 5
  1. #1
    SitePoint Wizard
    Join Date
    Oct 2004
    Location
    Newport Beach
    Posts
    1,761
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Changing Link Structure: 301 Redirect to Possible 404 or 404 First?

    Here is my issue. We are changing our linking structure from

    www.site.com/trailer/movie-title/video-title

    to just

    www.site.com/movie-title/video-title

    Obviously I want to 301 redirect (using nginx/mod_rewrite), but do not know if I should care about poorly/incorrectly written incoming links. Such as a site that miswrites our url as

    /trailer/movie-title/video-titlez

    They accidinatelly put a z on the back of the video title, and we'd respond with a 404 response when the page is open. Now, with the mod_rewrite, I would be 301 redirecting this bad url to the PHP file that analyzes, and then gives the 404 response.

    Is this okay way to do it? I've been told I don't need to worry about what SEs like Google thinks of 301 to 404 since the page was a longtime 404 anyway.

    The only other way to do it is leave a PHP file at the old url address and have it check the database and confirm the link is correct before forwarding on to the proper url if item exists or posting 404 response if not, which unfortunately means the 301 forwarding would mean my database would then need to effectively be checked twice if item exists, since the new url and PHP attached would double-confirm.

    Seeking feedback on this.

    Thanks!
    Ryan
    Upcoming Movies - Movie News. Updated Daily.
    Movie Trailers - Awesome trailer site. Nuff said.

  2. #2
    Certified Ethical Hacker silver trophybronze trophy dklynn's Avatar
    Join Date
    Feb 2002
    Location
    Auckland
    Posts
    14,672
    Mentioned
    19 Post(s)
    Tagged
    3 Thread(s)
    Hi cb!

    You didn't show your .htaccess code so any comment cannot deal with your entire question.

    Kudos for understanding that you probably don't need "trailer" in your URIs.

    It sounds like you've created what I've termed "A Poor Man's RewriteMap" with your request handler (the one that checks your database before redirecting to the requested file OR 404 handler). Kudos for that, too, EXCEPT that it means a delay for every file request (or every request meeting the ^([-a-z]+/[-a-z]+)$ format of your movie-title/video-title URIs).

    I have done something similar (article title as the request) at the wilderness-wally.com website but my redirects are to the file handler which, if it finds the article, outputs it and, if it doesn't, redirects to the Home Page (hopefully with a "Requested Article Not Found - Please use the TOC on the left" comment). The obvious difference is that I've skipped the intervening file which does your -f check (database required for this when not using file names) and embedding the "Lost and Found" in the article script.

    The one caution is to check on the allowable characters in URIs Uniform Resource Identifiers (URI): Generic Syntax (if you need my list of allowable characters, PM me) before redirecting to either the article handler or your request handler.

    Back to your question: IF your request handler cares about "trailer/" in the URI, have it check first. If not, strip the "trailer/" before the redirection (but only because that is your new format.

    WARNING: Before you change the format of your URIs, do NOT make the change if you're including the dot character in the movie-title or video-title as that is my "marker" to send !-f requests to my article handler. I believe that you've used trailer as your "marker" so, if that is suddenly missing, it could disable your website.

    Not enough information in the question => too much information in the response. Oh, well, at least you have the full story for your consideration.

    Regards,

    DK
    David K. Lynn - Data Koncepts is a long-time WebHostingBuzz (US/UK)
    Client and (unpaid) WHB Ambassador
    mod_rewrite Tutorial Article (setup, config, test & write
    mod_rewrite regex w/sample code) and Code Generator

  3. #3
    SitePoint Wizard
    Join Date
    Oct 2004
    Location
    Newport Beach
    Posts
    1,761
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I have nginx rewrites, so would need to look at them to give good idea of what I'm trying.

    location /trailer {
    rewrite ^/?trailer/([-0-9a-zA-Z,-]+)/([-0-9a-zA-Z,-]+)?/?([0-9a-zA-Z,-]+)?$ /watchtrailer.php?fkey=$1&tkey=$2&tres=$3;
    }

    Now, this is the original link that works. I could leave the watchtrailer.php file to remain, allow that to give the variables supplied a test against the database. If row exists, it can then 301 redirect to the proper url (as rewritten below) or just send a 404. All done from the php file itself.

    if (!-f $request_filename) {
    rewrite ^/?([0-9a-zA-Z,-]+)?/?([0-9a-zA-Z,-]+)?/?([0-9a-zA-Z,-]+)?/?([0-9a-zA-Z,-]+)? /index.php?var1=$1&var2=$2&var3=$3&var4=$4;
    }

    Included in the index file is the proper new-watchtrailer.php to handle the variables. So, with this, I'd have PHP handling everything in terms of doing 301 or 404. It's okay this way, but sucks in regards that I will be checking the database twice per request to the old url (one from watchtrailer.php and then again from new-watchtrailer.php). Not an issue, save the old url will likely be hit about 1M times per day at the beginning.

    I was hoping to go easier, and just do

    location /trailer {
    rewrite ^/?trailer/([-0-9a-zA-Z,-]+)/([-0-9a-zA-Z,-]+)?/?([0-9a-zA-Z,-]+)?$ /$1/$2 permanent;
    }

    So just have nginx handle the 301 redirect right away, making new-watchtrailer.php handle the request and decide whether to 200 or 404 response at that point.

    Let me know if that makes sense.

    Cheers!
    Ryan
    Upcoming Movies - Movie News. Updated Daily.
    Movie Trailers - Awesome trailer site. Nuff said.

  4. #4
    Certified Ethical Hacker silver trophybronze trophy dklynn's Avatar
    Join Date
    Feb 2002
    Location
    Auckland
    Posts
    14,672
    Mentioned
    19 Post(s)
    Tagged
    3 Thread(s)
    cb,

    Ouch! Obviously, nginx rewrites are quite different than Apache's mod_rewrite! I can't help you within the nginx world.

    Taking a look at your code, though, it appears that you could do a better job of making atoms optional (rather than everything along the way, i.e., frequent use of /?). Examples embedded in your quote:

    Quote Originally Posted by casbboy View Post
    I have nginx rewrites, so would need to look at them to give good idea of what I'm trying.

    location /trailer {
    rewrite ^/?trailer/([-0-9a-zA-Z,-]+)/([-0-9a-zA-Z,-]+)?/?([0-9a-zA-Z,-]+)?$ /watchtrailer.php?fkey=$1&tkey=$2&tres=$3;

    I would have placed the /? inside the next atom (and created a new atom for the one you already have - are you sure about the ","? After all, it is a RESERVED character in a URI, i.e., you can't use it the way you are!). That would leave $4 to replace $3 without confusing matters with an optional / embedded somewhere between your $2 and $3. Okay, I'm dizzy now, too, but (at least to me) it's important to keep the optional parts collected, i.e., you don't want a $3 without the / so making nginx decide where to break $2 and $3 is not a smart thing to do.

    }

    Now, this is the original link that works. I could leave the watchtrailer.php file to remain, allow that to give the variables supplied a test against the database. If row exists, it can then 301 redirect to the proper url (as rewritten below) or just send a 404. All done from the php file itself.

    if (!-f $request_filename) {
    rewrite ^/?([0-9a-zA-Z,-]+)?/?([0-9a-zA-Z,-]+)?/?([0-9a-zA-Z,-]+)?/?([0-9a-zA-Z,-]+)? /index.php?var1=$1&var2=$2&var3=$3&var4=$4;

    Same comment but more complex because there are two optional /'s, one between $2 and $3 and another between $3 and $4! If I were nginx, I would give up the ghost and so something wild with the input and directives you've issued ... just for fun!.

    }

    Included in the index file is the proper new-watchtrailer.php to handle the variables. So, with this, I'd have PHP handling everything in terms of doing 301 or 404. It's okay this way, but sucks in regards that I will be checking the database twice per request to the old url (one from watchtrailer.php and then again from new-watchtrailer.php). Not an issue, save the old url will likely be hit about 1M times per day at the beginning.

    I was hoping to go easier, and just do

    location /trailer {
    rewrite ^/?trailer/([-0-9a-zA-Z,-]+)/([-0-9a-zA-Z,-]+)?/?([0-9a-zA-Z,-]+)?$ /$1/$2 permanent;
    }

    That's just your "delete trailer/" code (with the complication of the optional / between $1 and $2).

    So just have nginx handle the 301 redirect right away, making new-watchtrailer.php handle the request and decide whether to 200 or 404 response at that point.

    Let me know if that makes sense.

    With all the /?'s in your code, no, it does not make sense to me (try a URI without the /'s and make a guess at what nginx will do and compare it with the output from your code).

    Okay, I'm definitely biased: I prefer my method of matching ^([-a-z]+/[-a-z]+)$ and sending to handler.php?title=$1 OR, for your dual input: ^trailer/([-a-z]+/[-a-z]+)$ watchtrailer.php?movie=$1&trailer_name=$2.

    For your if (!-f $request_filename) {
    rewrite ^/?([0-9a-zA-Z,-]+)?/?([0-9a-zA-Z,-]+)?/?([0-9a-zA-Z,-]+)?/?([0-9a-zA-Z,-]+)? /index.php?var1=$1&var2=$2&var3=$3&var4=$4;

    may I recommend you change the rewrite line to:

    rewrite ^([0-9a-zA-Z-]+)(/([0-9a-zA-Z-]+)(/([0-9a-zA-Z-]+)(/([0-9a-zA-Z-]+)?)?)? /index.php?var1=$1&var2=$3&var3=$5&var4=$7;

    to group your optional atoms (note that I've removed the "reserved" commas as they appear to be illegal as you've implied them to be used - see the referenced URI Generic Syntax from my first post and Find "Reserved").

    Cheers!
    Ryan
    Regards,

    DK
    Last edited by dklynn; Dec 15, 2013 at 00:41. Reason: Spacing for clarity of embedded responses
    David K. Lynn - Data Koncepts is a long-time WebHostingBuzz (US/UK)
    Client and (unpaid) WHB Ambassador
    mod_rewrite Tutorial Article (setup, config, test & write
    mod_rewrite regex w/sample code) and Code Generator

  5. #5
    SitePoint Wizard
    Join Date
    Oct 2004
    Location
    Newport Beach
    Posts
    1,761
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I actually figured it out. And I did the same way you mentioned. One of the few things that was more simple with nginx

    location /trailer {
    rewrite ^/?trailer/([-0-9a-zA-Z,-]+)/([-0-9a-zA-Z,-]+)$ /$1/$2 permanent;
    }

    Then to new-watchtrailer.php, which supplies 200 or 404. Works great. I also discovered I was doing the 404 in PHP not entirely right, which was good to find.

    I'll test your rewrite structure fix and removal of the commas.

    Cheers!
    Ryan


    Pulled out the "/trailer" level. Which leads to having the index.php page to open, include
    I didn't need the third level/variable. The addition of "permanent" works perfect for the redirect, and I've tested in a 301/302 checker and it does a perfect 301 to a 200. And it does do the 301 to the
    Upcoming Movies - Movie News. Updated Daily.
    Movie Trailers - Awesome trailer site. Nuff said.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •