SitePoint Sponsor

User Tag List

Results 1 to 9 of 9
  1. #1
    Foozle Reducer ServerStorm's Avatar
    Join Date
    Feb 2005
    Location
    Burlington, Canada
    Posts
    2,699
    Mentioned
    89 Post(s)
    Tagged
    6 Thread(s)

    Best method to not dilute search indexing

    Hi,

    If I have two domains a.com and b.com. Both these domains route to the same site. The a.com domain is the domain that search engines should index. There may be many people that have the b.com domain.

    What is the least resource intensive way to make this happen as there is domain maps, redirects, DNS web forwarding or just sticking with two Apache virtual servers and have them answer when they're called.

    Currently, there is not a deep page structure, however coming in January there will be, so the method I use should scale well to this page structure growth.

    I do have access to the Apache configuration file so a rewrite map is possible.

    Your thoughts are appreciated!

    Regards,
    Steve
    Last edited by ServerStorm; Dec 20, 2012 at 16:10.
    ictus==""

  2. #2
    Certified Ethical Hacker silver trophybronze trophy dklynn's Avatar
    Join Date
    Feb 2002
    Location
    Auckland
    Posts
    14,644
    Mentioned
    19 Post(s)
    Tagged
    3 Thread(s)
    Hi Steve!

    I'm not sure what you're asking. Are a and b co-located and serving the same files? If so, use a mod_rewrite redirection to get rid of the (secondary) domain (eventually, you can use it for something else) but, as you're aware, having the same content served by two domains is punished by Google (et al).

    Let me know if you need help with the trivial mod_rewrite to do this for you.

    On the other hand, did I miss the point of the question and head off into left field?

    Regards,

    DK
    David K. Lynn - Data Koncepts is a long-time WebHostingBuzz (US/UK)
    Client and (unpaid) WHB Ambassador
    mod_rewrite Tutorial Article (setup, config, test & write
    mod_rewrite regex w/sample code) and Code Generator

  3. #3
    Foozle Reducer ServerStorm's Avatar
    Join Date
    Feb 2005
    Location
    Burlington, Canada
    Posts
    2,699
    Mentioned
    89 Post(s)
    Tagged
    6 Thread(s)
    Quote Originally Posted by dklynn View Post
    Hi Steve!

    I'm not sure what you're asking. Are a and b co-located and serving the same files? If so, use a mod_rewrite redirection to get rid of the (secondary) domain (eventually, you can use it for something else) but, as you're aware, having the same content served by two domains is punished by Google (et al).

    Let me know if you need help with the trivial mod_rewrite to do this for you.

    On the other hand, did I miss the point of the question and head off into left field?

    Regards,

    DK
    Hi David,

    No you did not miss the point.

    I simple did not know what would be the best long-term way to handle this. The a and b files are the same on the same server, and you are right that I want to do this so that no google punishment ensues.

    I've got the rewrite down, but I won't hesitate to ask for you help should I run into a problem.

    Many thanks!
    Steve
    ictus==""

  4. #4
    Foozle Reducer ServerStorm's Avatar
    Join Date
    Feb 2005
    Location
    Burlington, Canada
    Posts
    2,699
    Mentioned
    89 Post(s)
    Tagged
    6 Thread(s)
    Hi,

    Here is my rewrite so far.

    Code:
    RewriteEngine on
    # Match any alpha-numeric character using .php extention
    RewriteRule ^([a-z0-9]+)$ $1.php 
    # Match either domain
    RewriteCond %{HTTP_HOST} !^a\.com$ [NC] [OR]
    RewriteCond %{HTTP_HOST} !^b\.com$ [NC] 
    # Permanently redirect either domain to a non www version
    RewriteRule .? http://a.com%{REQUEST_URI} [R=301,L]
    It works the way it is written, however I would like to try to tighten it up against a few things and have been running into infinite loops in doing so.

    In the browser if I type http://b.com/home it redirects to http://a.com/home.php instead of http://a.com/home. If I however type http://b.com/contact I get http://a.com/contact - which is correct. I thought as rules are processed sequentially that no matter if the request URI is a.com/something.php or b.com/something.php that it would serve the not php version?

    Furthermore I tried different ways to get the custom 404.php file working but no matter what sequence I put it in, I got the generic Internal Server error 500 infinite loop. I was using this code for the 404.php file.

    Code:
    # Check it is not a file
    RewriteCond %{REQUEST_FILENAME} !-f
    # Check it is  not a directory
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule .? /404.php [L]
    The 404.php file is located in the root directory of the site. I've checked the error & access logs and it doesn't provide any meaningful feedback on this.

    I hope I've been clear.
    ictus==""

  5. #5
    Foozle Reducer ServerStorm's Avatar
    Join Date
    Feb 2005
    Location
    Burlington, Canada
    Posts
    2,699
    Mentioned
    89 Post(s)
    Tagged
    6 Thread(s)
    Well I got the 404.php error resolved. I changed the .htaccess code to:
    Code:
    # Check it is not a file
    RewriteCond %{REQUEST_FILENAME} !-f
    # Check it is not a directory
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule .? 404.php [L]
    I removed the / that used to read /404.php; I thought the slash was needed to define it was in the root directory, but in my case it was causing the problem?

    I still am working on the other issue defined in Post #4.

    Regards,
    Steve
    ictus==""

  6. #6
    Foozle Reducer ServerStorm's Avatar
    Join Date
    Feb 2005
    Location
    Burlington, Canada
    Posts
    2,699
    Mentioned
    89 Post(s)
    Tagged
    6 Thread(s)
    Ok,

    I figured out a way to simplify this. It occurred to me that given that the two virtual hosts for a.com and b.com are pointing to the same files, including the .htaccess file. I don't need to mention b.com in the .htaccess. Instead this code does all of the following:
    • Cleans .php from all files (wanted to not needlessly tell bad people that PHP is used)
    • Removes www
    • Redirects b.com/$1 to a.com/$1 (where $1 is the request uri)
    • Traps all missing pages in requests to b.com/$1 and redirects them to a custom 404.php file.


    Here is the code:
    Code:
    RewriteEngine on
    
    
    # Match  alpha-numeric characters in the Request URI that use .php extensions
    RewriteRule ^([a-z0-9]+)$ $1.php 
    
    # Match either domain
    RewriteCond %{HTTP_HOST} !^a\.com$ [NC] 
    # Permanently redirect either domain to a non www versions
    RewriteRule .? http://a.com%{REQUEST_URI} [R=301,L]
    
    # 404 error matching
    # Not a file
    RewriteCond %{REQUEST_FILENAME} !-f
    # Not a directory
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule .? 404.php [L]
    ictus==""

  7. #7
    Certified Ethical Hacker silver trophybronze trophy dklynn's Avatar
    Join Date
    Feb 2002
    Location
    Auckland
    Posts
    14,644
    Mentioned
    19 Post(s)
    Tagged
    3 Thread(s)
    Hi Steve,

    Sorry for my delay returning to your post - don't be afraid to PM me (I don't usually bite ... too hard )!

    I did look over your code and only had one serious comment about it: Before redirecting (.*) to $1.php, check that the file exists (as a .php file).

    Your last post, however, is the most critical: It attempts to give your "specifications."

    With the a.com and b.com domains sharing the files, there's a need (on your part) to pick the preferred domain name (so you're not penalized by SE's). I'll assume a.com is preferred.

    • Cleans .php from all files (wanted to not needlessly tell bad people that PHP is used)
    • Removes www
    • Redirects b.com/$1 to a.com/$1 (where $1 is the request uri)
    • Traps all missing pages in requests to b.com/$1 and redirects them to a custom 404.php file.


    Okay, I'd reorder that to:

    • Redirect b.com to a.com with 301
    • Remove www (subdomain) with 301
    • Strip .php file extension (ONLY if not {IS_SUBREQ}) with 301
    • Redirect extensionless filenames to .php version (ONLY if it exists) - hidden
    • Handle 404s (hidden) - personally, I'd redirect to a sitemap or simply use an ErrorDocument statement instead


    Let me code in the modified order (for simplicity's sake):

    Code:
    RewriteEngine on
    
    # Redirect b.com to a.com with 301 AND
    RewriteCond %{HTTP_HOST} b\.com [NC,OR]
    # Remove www (subdomain) with 301
    RewriteCond %{HTTP_HOST} ^www\. [NC]
    # Note: {HTTP_HOST} is not case sensitive while mod_rewrite is so the No Case flags are needed
    RewriteRule .? http://a.com%{REQUEST_URI} [R=301,L]
    # Note: Both the {REQUEST_URI} and {QUERY_STRING} are preserved; {IS_SUBREQ is not set}
    
    # Strip .php file extension (ONLY if not {IS_SUBREQ}) with 301
    RewriteCond %{IS_SUBREQ} !true
    RewriteRule ^(.*)\.php$ $1 [R=301,L]
    # Note: {REQUEST_URI} has .php file extension stripped; {IS_SUBREQ} is now true
    
    # Redirect extensionless filenames to .php version (ONLY if it exists) - hidden
    RewriteCond %{REQUEST_FILENAME}.php -f
    RewriteRule ^([^.]+)$ $1.php [L]
    # Note: {REQUEST_URI} regains .php file extension; {IS_SUBREQ} is now true
    
    # Handle 404s - personally, I'd redirect to a sitemap or simply use an ErrorDocument statement instead
    # Preferred - as core directive, I'd move before RewriteEngine on
    ErrorDocument 404 /404.php
    
    # Second Choice
    # ErrorDocument 404 /sitemap.php
    # Note that ErrorDocument requires the status code and an ABSOLUTE URI/URL - I used internal absolute URIs
    
    # Last Choice - but may be useful if you need to know what was requested in the 404 script
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule .? 404.php?request=%{REQUEST_URI} [L]
    
    # Remove the comments before uploading to the production server.
    Please note that the .? in the RewriteRules above will ALWAYS evaluate to true so the request is made based on the RewriteCond statements preceding the RewriteRule. Also, I preach to avoid the villany of (.*) so I will use the Apache variable which amounts to the same thing (when that's all that in the regex for RewriteRules): {REQUEST_URI}.

    Not much there which is out of the ordinary except the {IS_SUBREQ}. Is Subrequest is only set when there has been an INTERNAL redirection made (I admit to not being sure whether it's null or false if not set) and it's only used to prevent looping between your visible and usable formats. My signature's tutorial shows that I first enabled "loopy code" by adding a marker (key) to a query string and tested for that ... until I discovered that the {IS_SUBREQ} "marker" is already available via Apache!

    Regards,

    DK
    David K. Lynn - Data Koncepts is a long-time WebHostingBuzz (US/UK)
    Client and (unpaid) WHB Ambassador
    mod_rewrite Tutorial Article (setup, config, test & write
    mod_rewrite regex w/sample code) and Code Generator

  8. #8
    Foozle Reducer ServerStorm's Avatar
    Join Date
    Feb 2005
    Location
    Burlington, Canada
    Posts
    2,699
    Mentioned
    89 Post(s)
    Tagged
    6 Thread(s)
    Hi David,

    Thank you for your detailed explanation and rework of the rewriting code. Unfortunately the server's configuration may be getting in the way of your suggested changes as I am getting
    Code:
    The web page at http://a.com/var/www/ClientFolder/404 has resulted in too many redirects.
    This message is being generated with only this code:
    Code:
    # Strip .php file extension (ONLY if not {IS_SUBREQ}) with 301RewriteCond %{IS_SUBREQ} !true
    RewriteRule ^(.*)\.php$ $1 [R=301,L]
    # Note: {REQUEST_URI} has .php file extension stripped; {IS_SUBREQ} is now true
    
    
    # Redirect extensionless filenames to .php version (ONLY if it exists) - hidden
    RewriteCond %{REQUEST_FILENAME}.php -f
    RewriteRule ^([^.]+)$ $1.php [L]
    # Note: {REQUEST_URI} regains .php file extension; {IS_SUBREQ} is now true
    I understand your recommendation to test for a key match on loopy code, but am not sure why I get a loop when I don't using the now modified code (based on some of your recommendations).

    Code:
    # 404.php is a sitemap with a message that the requested page was not found.
    ErrorDocument 404 /404.php
    
    
    RewriteEngine on
    
    
    # Redirect b.com to a.com with 301 and...
    RewriteCond %{HTTP_HOST} !^b\.com$ [NC] 
    # Remove www (subdomain) with 301
    RewriteCond %{HTTP_HOST} ^www\. [NC]
    # match request URIs 
    RewriteRule .? http://a.com%{REQUEST_URI} [R=301,L]
    
    
    # Redirect extensionless filenames to .php version (ONLY if it exists)
    RewriteCond %{REQUEST_FILENAME}.php -f
    RewriteRule ^([^.]+)$ $1.php [L]
    The one other issue I'm not sure if you were trying to show me is when someone requests a.com/somepage.php, if it exists then it is not rewritten to a.com/somepage it stays a.com/somepage.php but it does resolve if the request is made to a.com/somepage?

    Many thanks and Merry Christmas
    ictus==""

  9. #9
    Certified Ethical Hacker silver trophybronze trophy dklynn's Avatar
    Join Date
    Feb 2002
    Location
    Auckland
    Posts
    14,644
    Mentioned
    19 Post(s)
    Tagged
    3 Thread(s)
    Steve,

    Quote Originally Posted by ServerStorm View Post
    Hi David,

    Thank you for your detailed explanation and rework of the rewriting code. Unfortunately the server's configuration may be getting in the way of your suggested changes as I am getting
    Code:
    The web page at http://a.com/var/www/ClientFolder/404 has resulted in too many redirects.


    Every time I've seen that, it's cured by restarting the server, i.e., it's a glitch in the server which has to be cleared. My hosts have been very cooperative about doing that - yours should be too. No sense in showing your full link via the host!

    This message is being generated with only this code:
    Code:
    # Strip .php file extension (ONLY if not {IS_SUBREQ}) with 301
    RewriteCond %{IS_SUBREQ} !true
    RewriteRule ^(.*)\.php$ $1 [R=301,L]
    # Note: {REQUEST_URI} has .php file extension stripped; {IS_SUBREQ} is now true
    
    
    # Redirect extensionless filenames to .php version (ONLY if it exists) - hidden
    RewriteCond %{REQUEST_FILENAME}.php -f
    RewriteRule ^([^.]+)$ $1.php [L]
    # Note: {REQUEST_URI} regains .php file extension; {IS_SUBREQ} is now true
    I understand your recommendation to test for a key match on loopy code, but am not sure why I get a loop when I don't using the now modified code (based on some of your recommendations).

    Code:
    # 404.php is a sitemap with a message that the requested page was not found.
    ErrorDocument 404 /404.php
    
    
    RewriteEngine on
    
    
    # Redirect b.com to a.com with 301 and...
    RewriteCond %{HTTP_HOST} !^b\.com$ [NC,OR] 
    
    1. You want the RewriteRule to redirect if EITHER b.com or www.whatever.com so LEAVE the OR flag! 2. OMG! That should be NOT b.com and 3. It should not have the start anchor as that's taken care of in the next RewriteCond. Remember, you're redirecting away from b.com and to a.com. I had all that so I don't understand why you altered the code like this.
    # Remove www (subdomain) with 301 RewriteCond %{HTTP_HOST} ^www\. [NC] # match request URIs RewriteRule .? http://a.com%{REQUEST_URI} [R=301,L] # Redirect extensionless filenames to .php version (ONLY if it exists) RewriteCond %{REQUEST_FILENAME}.php -f RewriteRule ^([^.]+)$ $1.php [L]
    The one other issue I'm not sure if you were trying to show me is when someone requests a.com/somepage.php, if it exists then it is not rewritten to a.com/somepage it stays a.com/somepage.php but it does resolve if the request is made to a.com/somepage?

    Code:
    # Strip .php file extension (ONLY if not {IS_SUBREQ}) with 301
    RewriteCond %{IS_SUBREQ} !true
    RewriteRule ^(.*)\.php$ $1 [R=301,L]
    # Note: {REQUEST_URI} has .php file extension stripped; {IS_SUBREQ} is now true
    Well, that's pretty simple if you understand that {IS_SUBREQ} is an environmental variable which tells Apache whether there's been an internal redirection or not. Checking if it's NOT true means a match only if there had been no internal redirection, therefore, it's safe to strip the .php file extension in the RewriteRule. On any subsequent pass through the .htaccess, it will be true (not match) so the strip will NOT be performed and you'll go on add the extension and serve the file (not loop).

    Many thanks and Merry Christmas
    Thanks! Christmas was great! 'Hope you enjoy yours as it's already the wee hours there and Santa's in the Mountain time zone.

    Regards,

    DK
    David K. Lynn - Data Koncepts is a long-time WebHostingBuzz (US/UK)
    Client and (unpaid) WHB Ambassador
    mod_rewrite Tutorial Article (setup, config, test & write
    mod_rewrite regex w/sample code) and Code Generator


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •