This has been going on too long! I have struggled with the same problem for several months in hopes of avoiding asking for help here… I guess since I can’t seem to solve it I will admit it! Your smart, I’m dumb, your great looking, I am dumb, your good, I am bad, Your rich, I am poor, etc. Now, can I ask for help?
I hace a problem I have not found any info on out in the wide world of web concerning rewrite (or re-direct?) of tld’s into narrowly defined scope of dynamic pages on our cms.
An example: main domain is domain.us. The majority of pages are served from this domain. I have, for this example, on page, lets say index.pl?iid=5555 that is dedicated to instructional videos, and needs to be accessible by domain.tv. And only that domain, not any other. The other domain.xx tld’s have more than one page but follow the same basic pattern.
I will include code from our hosting conf file, I dont use htaccess. I can, but would rather everything stays in the conf. Here is a couple examples of what my latest attempt is to isolating these domains to specific pages of our cms…
I am not really sure why it isn’t working, and I know it could be optimized and implemented in much easier fashion but this is a first stab at making sure we can avoid duplicate content while focusing our efforts on regional and topic based pages. Am a real novice at this…
Why do you have the ^ in RewriteCond %{QUERY_STRING} ^i+d=87031 [NC] ? The ^ there means that the query string should start with the regex you specified, so for example ?a=1&iid=87031 won’t match.
Why the i+d btw?
The rest of your code looks good to me. Just a few pointers:
You can force www for both domains in one statement:
# called without query string send to video page
[COLOR="Red"]RewriteCond %{HTTP_HOST} ^domain\\.tv [NC,OR][/COLOR]
RewriteCond %{HTTP_HOST} ^www\\.domain\\.tv [NC]
RewriteRule .* http://www.domain.tv/index.pl?iid=87031 [R,L]
the line in red is superfluous because you already forced www, so that condition will never ever be true. It’s only burning CPU cycles for nothing.
Avoid the use of :redhot: (.) :redhot: at all costs. It’s the most dreaded regex construction ever and usually causes more problems than it solves.
Instead of using . in rules, use .? and instead of matching ^(.*)$ don’t match anything but use .? and redirect to %{REQUEST_URI} instead of /$1
The reason for the ^ at the start of the rewrite condition is that the url never starts with anything else but the iid. the reason for the pattern i+d is that there is still legacy code that uses id in the url so I was trying to make it match id or iid.
I will add your other comments and see how it turns out. I am hopeful I can get beyond the redirect that will never complete, its got to finish someday and I plan to be there when it does! thanks again.
Ah, so the code is LOOPY eh? You didn’t say that in your OP. Let me go through it again.
Your reason for i+d makes sense, but if you only need to math id= or iid= I would change it to ii?d=. ii?d= will only match id= and iid= while i+d will match id=, iid=, iiid=, iiiid=, etc. It’s always better to specify exactly what you want, instead of specifying that does what you want but can also do other stuff you don’t (necessarily) want.
Back to your code.
After the changes I suggested in the previous post and after changing all i+d= to ii?d= your code looks like this:
# Force www
RewriteCond %{HTTP_HOST} !^www\\. [NC]
RewriteRule .? http://www.%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
# query string is .ca page, but called with a different domain, rewrite to the .ca domain
RewriteCond %{QUERY_STRING} ii?d=1092(12|32|28|19) [NC]
RewriteCond %{HTTP_HOST} !^www\\.domain\\.ca$ [NC]
RewriteRule ^(.*)$ http://www.domain.ca$1 [R=301,L]
# user called domain without query string, put them on .ca homepage
RewriteCond %{HTTP_HOST} ^www\\.domain\\.ca [NC]
RewriteRule . http://www.domain.ca/index.pl?id=109212 [L]
# called without query string send to video page
RewriteCond %{HTTP_HOST} ^www\\.domain\\.tv [NC]
RewriteRule .* http://www.domain.tv/index.pl?iid=87031 [R,L]
# query string is incorrect for domain specified, redirect to home of video domain
RewriteCond %{QUERY_STRING} ^ii?d=87031 [NC]
RewriteCond %{HTTP_HOST} !^www\\.domain\\.tv$ [NC]
RewriteRule .* http://www.domain.tv/index.pl?iid=87031 [L]
# .tv domain called with a non .tv page, send to page using main tld.
RewriteCond %{QUERY_STRING} !^ii?d=87031$ [NC]
RewriteCond %{HTTP_HOST} ^www\\.domain\\.tv$ [NC]
RewriteRule ^(.*)$ http://www.domain.us$1 [R=301,L]
The loop is caused by the last two blocks. Let me walk you through it
From now on everywhere I say id=something I mean id=something or iid is something
Man, when I was younger, college age, I used to have to pay a fortune for instructional material as you have given. Thank you.
I am not sure what you mean by the “code is LOOPY”. I imaging it is a acronym I am unfamiliar with but if I spent enough time thinking on it I could perhaps discover the meaning, just like we all have over time, learned various great acronyms… One of my favorites since many do not have knowledge yet is ROFLMMFAO. anyway, I can’t believe I waste bytes like that.
I have a question first before I relate my experiement this afternoon and my progress in whipping this beast into mashed potatoes (<-WTH?)
I do not think this statement is completing properly, and I had to fool with it a bit to try and keep my cookie-less domains from being www domains. I saw something today and I may be mistaken as to what it is truly trying to tell us, but this anomaly was up till now unbeknown to myself, and perhaps many others (these little quirks are the glue that binds things together but are never exposed in my history of researching this hell category. When you use a negative expression the patter can not be replaced since to do so would not work. I don’t know, what you want the exact wording? geez…
Note
When using the NOT character to negate a pattern, you cannot include grouped wildcard parts in that pattern. This is because, when the pattern does NOT match (ie, the negation matches), there are no contents for the groups. Thus, if negated patterns are used, you cannot use $N in the substitution string!
Could that be responsible for the www. replace algorithm you gave earlier not working properly?
As to my experiements, I was actually out of town today meeting my kids (I am sure they don’t appreciate that I refer to them as kids still, both attending college) and so I spent what little time I had this morning breaking my website by improperly rewriting the www at the very start of the slew of rewrite rules (slew: more than one, but less than 50 url rewrite rules… no? oh…). Right before I had to rush to meet them I was able to get the www form of my sight showing up but the non www was throwing up the default page (build into the “hosting package”).
I did however, have some success in forcing the .ca to behave on part of the rule. I think my discover of this previous afternoon may be part of the answer if you can affirm this sub problem with NOT. I will make these other changes, test and then post some results here in a short moment.
I am not going to be redirecting this page until later since this rewrite could change dependent on if they are trying to access a .mx page with a .ca tld. It will happen, just down the road a bit…
A final thought (thats a morbid saying), the two opposing coge snippets below, which were I believe causing the error genie to magically appear, due to endless redirect, there was a typo. The second statement should have redirected to http://www.domain.tv. Anywhoo… Thank you for your generous help. ttysttfnttyltitty
I just noticed that you have different match patterns in most of these rewrite directive statements. You start off like you explained I should do it, with a .?
The you use ^(.) and . and . etc (I am most likely the cause of that and it was just overlooked but in the method you suggested I use {HTTP_HOST}{REQUEST_URI how would I go about including that in the url rewrite directive?
Got different errors spanning from an unknown page, to website without style since the css file didn’t get loaded, and/or, images that are trying to resolve an address that does not exist, as it would in the case that the image url did not undergo required changes (this is due to a rewrite I do as one of the first tasks removing the timestamp from the url so the image can display. the pattern is RewriteRule (.)-jcb\d+\.(.)$ $1.$2 [L] <– Can I do a last flag in a rewrite directive? I just put it in to see if I could head off these urls being rewritten by www process…
LOOPY just means looping, or causing an infinite loop
I’m not creating nor using any variables in the RewriteRule so that can’t be the problem here. Does it redirect you at all when you visit a non-www version of either domain?
Just tested the code here btw and it works fine (Apache 2.x)
Yeah I thought it was a bit odd
For future reference please copy/paste your code. copy/paste can’t make typos
I am not sure as I had to do emergency surgery this after noon after the site wouldn’t resolve. can’t remember which was ultimately successful. I have tried several methods to get around the problem and luckily this latest attempt is serving pages again, heheh.
Oh, but I did indeed copy and paste, the error existed in the conf file, and was overlooked in my haste to roll out the code as quickly as possible.
I have had to jimmy some things, use a little ducktape, and slightly mod some others but I believe I have a good start (better than it’s gone ever before).
I ran into a problem with the www force. I have two domains for static content, and those domains should ride without cookies, and it also makes it easy to shut off specialized features by css file. And that was what was happening in a round about way. Lets call these “extra” domains static1 and static3 (2 has enough populariy, what with “the two of us”, “it takes two baby”,two scoops of raisins, etc. etc). T
hese urls show up a lot and somehow I had to kick them out of line and send them to the pricipals office so they could avoid url rewrite. I noticed a [PT] but it didn’t seem to apply, and I saw a dash also, but I no idea where I was suppose to put it. I need to ensure sure no rewritintg takes place and the two static domains get some sort of free pass to get to their seats without a cookie. What would be the best practice for ensuring a domain(s) dont get rewritten?
I will post what I have but note I am going through line by line and testing to ensure I don’t shut down again. Oh, and this is a conf file as I am not using htaccess… Thanks
<Directory />
Order allow,deny
Allow from all
</Directory>
<Directory /var/www/vhosts/domain.us/httpdocs>
AllowOverride All
<IfModule sapi_apache2.c>
php_admin_value open_basedir none
</IfModule>
<IfModule mod_php5.c>
php_admin_value open_basedir none
</IfModule>
Options FollowSymLinks SymLinksIfOwnerMatch
</Directory>
<IfModule mod_headers.c>
BrowserMatch MSIE ie
Header set X-UA-Compatible "IE=Edge,chrome=1" env=ie
</IfModule>
RewriteEngine On
RewriteCond %{QUERY_STRING} id=-100
RewriteRule (.*) http://www.domain.us [L,R=301]
RewriteCond %{QUERY_STRING} iid=-100
RewriteRule (.*) http://www.domain.us [L,R=301]
RewriteCond %{HTTP_HOST} ^www\\.olddomain\\.com$ [NC]
RewriteRule .? http://www.domain.us%{REQUEST_URI} [R=301,L]
PerlRequire "/home/httpd/metapointcms/setcwd.pl"
Alias /index.pl /home/httpd/metapointcms/metapointcms/index.pl
Alias /userchannel.pl /home/httpd/metapointcms/metapointcms/userchannel.pl
Alias /newsexport.pl /home/httpd/metapointcms/metapointcms/newsexport.pl
Alias /images/ /home/httpd/metapointcms/html/images/
Alias /js/ /home/httpd/metapointcms/html/js/
Alias /htmlarea3/ /home/httpd/metapointcms/html/htmlarea3/
<IfModule mod_fcgid.c>
ProcessLifeTime 7200
IPCCommTimeout 7200
IPCConnectTimeout 300
</IfModule>
<Files ~ (\\.pl$)>
SetHandler perl-script
PerlResponseHandler ModPerl::Registry
Options +ExecCGI
allow from all
</Files>
AddType application/octet-stream .pdf
AddType application/octet-stream .exe
# compress all text & html:
# gzip compression.
<IfModule mod_deflate.c>
# html, xml, css, and js:
AddOutputFilterByType DEFLATE text/html text/plain text/xml text/css application/x-javascript text/javascript application/javascript application/json
# webfonts and svg:
<FilesMatch "\\.(ttf|otf|eot|svg)$" >
SetOutputFilter DEFLATE
</FilesMatch>
</IfModule>
#####################################################
# CONFIGURE media caching
#
Header unset ETag
FileETag None
# Usual config
ExpiresActive On
ExpiresDefault "access plus 1 year"
<FilesMatch "\\.(ico|gif|jpg|jpeg|png|flv|pdf|swf|mov|mp3|wmv|ppt)$">
Header unset Last-Modified
Header set Expires "access plus 1 year"
Header set Cache-Control "public, no-transform"
</FilesMatch>
<FilesMatch "\\.(xml|txt|html|htm|js|css)$">
Header unset Last-Modified
Header set Expires "access plus 1 week"
Header set Cache-Control "private, must-revalidate"
</FilesMatch>
<FilesMatch "\\.(cgi|php|pl)$">
ExpiresDefault A0
Header set Cache-Control "no-store, no-cache, must-revalidate, max-age=0"
Header set Pragma "no-cache"
</FilesMatch>
#RewriteLog "/tmp/rewrite.log"
#RewriteLogLevel 9
#RewriteLogLevel 5
RewriteRule (.*)-jcb\\d+\\.(.*)$ $1.$2
# cookieless domain for images and stuff
RewriteCond %{HTTP_HOST} ^domain\\.me$ [NC,OR]
RewriteCond %{HTTP_HOST} ^static3\\.me$ [NC]
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_FILENAME} !-f
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_FILENAME} !-d
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_FILENAME} !-l
RewriteRule .? http://www.domain.us%{REQUEST_URI}
ServerAlias *.domain.us *.domain.ca *.domain.mx
#RewriteCond %{HTTP_HOST} !^domain\\.me$ [NC,OR]
#RewriteCond %{HTTP_HOST} !^static3\\.me$ [NC]
# Force www
#RewriteCond %{HTTP_HOST} !^www\\. [NC]
#RewriteRule .? http://www.%{HTTP_HOST}%{REQUEST_URI} [R=301]
RewriteCond %{HTTP_HOST} ^domain.us$
RewriteRule .? http://www.domain.us%{REQUEST_URI} [R=301]
RewriteCond %{HTTP_HOST} ^domain.ca$
RewriteRule .? http://www.domain.ca%{REQUEST_URI} [R=301]
RewriteCond %{HTTP_HOST} ^domain.mx$
RewriteRule .? http://www.domain.mx%{REQUEST_URI} [R=301]
RewriteCond %{HTTP_HOST} ^domain.tv$
RewriteRule .? http://www.domain.tv%{REQUEST_URI} [R=301]
#French - Canada
# query string is .ca page, but called with a different domain, rewrite to the .ca domain
RewriteCond %{QUERY_STRING} ii?d=1092(12|32|28|19) [NC]
RewriteCond %{HTTP_HOST} !^www\\.domain\\.ca$ [NC]
RewriteRule ^(.*)$ http://www.domain.ca$1 [R=301,L]
# user called domain without query string, put them on .ca homepage
#RewriteCond %{HTTP_HOST} ^www\\.domain\\.ca [NC]
#RewriteRule . http://www.domain.ca/index.pl?id=109212 [L]
# called without query string send to video page
#RewriteCond %{HTTP_HOST} ^www\\.domain\\.tv [NC]
#RewriteRule .* http://www.domain.tv/index.pl?iid=87031 [R,L]
# query string is incorrect for domain specified, redirect to home of video domain
#RewriteCond %{QUERY_STRING} ^ii?d=87031 [NC]
#RewriteCond %{HTTP_HOST} !^www\\.domain\\.tv$ [NC]
#RewriteRule .* http://www.domain.tv/index.pl?iid=87031 [L]
# .tv domain called with a non .tv page, send to page using main tld.
#RewriteCond %{QUERY_STRING} !^ii?d=87031$ [NC]
#RewriteCond %{HTTP_HOST} ^www\\.domain\\.tv$ [NC]
#RewriteRule ^(.*)$ http://www.domain.us$1 [R=301,L]
#_________________________old________________________
#RewriteCond %{HTTP_HOST} ^www\\.domain\\.ca$ [NC]
#RewriteCond %{QUERY_STRING} !^ii?d=1092(12|32|28|19)$ [NC]
#RewriteCond %{QUERY_STRING} !^$ [NC]
#RewriteRule .? http://www.domain.us%{REQUEST_URI}
#RewriteCond %{HTTP_HOST} ^www\\.domain\\.ca$ [NC]
#RewriteCond %{QUERY_STRING} ^$ [NC]
#RewriteRule .? http://www.domain.ca/index.pl?id=109212 [L]
#RewriteCond %{HTTP_HOST} !^www\\.domain\\.ca$ [NC]
#RewriteCond %{QUERY_STRING} ^ii?d=1092(12|32|28|19)$ [NC]
#RewriteRule .? http://www.domain.ca%{REQUEST_URI}
# Spanish
# RewriteCond %{HTTP_HOST} ^www\\.domain\\.mx$ [NC]
# RewriteRule .? http://www.domain.mx/index.pl?id=109240 [R=301,L]
# RewriteCond %{QUERY_STRING} i+d=1092(40|52|53|54) [NC]
# RewriteCond %{HTTP_HOST} !^www\\.domain\\.mx$ [NC]
# RewriteRule .? http://www.domain.mx%{REQUEST_URI} [R=301,L]
# RewriteCond %{QUERY_STRING} !i+d=1092(40|52|53|54) [NC]
# RewriteCond %{QUERY_STRING} !^$ [NC]
# RewriteCond %{HTTP_HOST} ^www\\.domain\\.mx [NC]
# RewriteRule .? http://www.domain.us%{REQUEST_URI}
#youtube and Video
#RewriteCond %{HTTP_HOST} ^www\\.domain\\.tv$ [NC]
# RewriteCond %{QUERY_STRING} ^$ [NC]
#RewriteRule .? http://www.domain.tv/index.pl?iid=87031 [L]
#RewriteCond %{QUERY_STRING} ^ii?d=87031 [NC]
#RewriteCond %{HTTP_HOST} !^www\\.domain\\.tv$ [NC]
#RewriteRule .? http://www.domain.tv/index.pl?iid=87031 [L]
#RewriteCond %{QUERY_STRING} !^ii?d=87031$ [NC]
#RewriteCond %{HTTP_HOST} ^www\\.domain\\.tv$ [NC]
#RewriteRule .? http://www.domain.us%{REQUEST_URI} [R=301,L]
#RewriteCond %{HTTP_HOST} ^domain\\.(us|ca|mx|tv)$
#RewriteRule .? http://www.domain.$1/$2 [R=301,L]
#RewriteCond %{HTTP_HOST} ^domain.us$
#RewriteRule ^(.*)$ http://www.domain.us$1 [R=301,L]
RewriteRule ^/$ /index.pl [PT]
I would create separate virtualhosts for the static domains that point to the same directory as the other domains. That way you can have completely clean virtualhost directives without any rewrites at all. That’s also a lot faster! Why haul every request for a domain through mod_rewrite if you know beforehand it will never be rewritten?
Also, could you please post your code again, without the comments, including the VirtualHost directive and using the original domain names, wrapped in [noparse]
[/noparse].
The hiding of the real domain names is confusing …
[size=1]yes, posting your domain in
blocks is perfectly fine and won't be seen as spamming in case you were afraid of that :)[/size]
I guess I do not understand what you are proposing. Our static url domains for staic content are both domain aliases for the .us main domain name. thus the url’s intellicad.me and progecad.me both point at the ip address and unfortunately get ncluded on
Also, could you please post your code again, without the comments, including the VirtualHost directive and using the original domain names, wrapped in [noparse]
[/noparse].
The hiding of the real domain names is confusing …
[size=1]yes, posting your domain in
blocks is perfectly fine and won't be seen as spamming in case you were afraid of that :)[/size][/QUOTE]
I'll try to get that tomorrow, today is about then...
Thanks,
Scott
Sorry, had other tasks to complete before I could get back on this. I actually have much more to do but am playing hooky to get another crack at this. I like your idea of the virtual hosts, but, alas, I am using plesk to manage the hosting on our servers so the virtual host directives in the vhosts.conf file would essentially calling virtual hosts inside of virtual hosts. As I understand it, plesk causes the include file to be overwritten with the items in vhosts, but it sneaks in a virtual host statement right before it brings in the vhosts.conf. I may be able to load multiple vhosts.conf which would allow the directives you are talking about but am not entirely sure yet. The problem I keep having is getting the .www force to not force itself on the cookieless domains. I have tried:
For some reason neither seem to work completely. For instance, running a progecad.tv without www doesn’t add it on the first set of directives, and on the second set, the rewrite adds www. to intellicad.me and/or progecad.me despite the request to leave them out of it. Perhaps my logic is incorrect on the NOT statements.
I am trying to determine if they entered a tld without a query string but am unsure how to do so. Actually, the rule is did they enter only a host name without anything else. I thought I would try to determine an empty request_uri or an empty query string but I didnt see any way of doing so.
I think you’re close but have a problem understanding the different parts of the URL: protocol://domain/path/to/filename?query_string. Dealing with anything other than the URI (path/to/filename) requires a RewriteCond to match the correct Apache variable against your regex.
That said, it’s a simple matter to force www on all with one statement (a combined RewriteCond on the start of the {HTTP_HOST} string and one RewriteRule for the redirection. What your last (new) bit is doing is adding another series of conditions where it could be simply handled by just specifying the filename and query string in the first place (your regex says you don’t care about a requested URI but that’s likely in error).
As for your first two code blocks, I’d put the second one in first and then change the first block from:
One question remains that I am still struggling with, that is the need to determine if an url is simply a host or a host with request url and query string.
I can’t believe that I once again am here, testifying before you that the darned thing still isn’t working, despite help from some of the greatest minds in the universe (big grin).
So, I have one last condition despite my best try efforts seems to react differently than I had expected. To come up to speed, we have a cms with multiple domains for country specific url’s which are pointed at various pages and their subpages not defined in a directory structure but as dynamic pages in the form of www.domain.ca/index.pl?iid=XXXX. So far we have been able to force pages which do not belong to .ca or .mx to .us, and those not belonging to .us to either .ca or .mx and we can make sure that the domains are always displaying the proper tld for the country intended and in the language intended EXCEPT for when the domain is entered without a request filename or without a query string. So my rule for this is as follows (modified with a / to try and emulate what it returns in the logs…)
The log is below, it seems to A. have a problem recognizing the rule immediately below, which is matching the country tld to what I thought was the pattern: ca|us|mx|tv, and B. matching the / pattern but trying to then rewrite to an url but it is looking for a path (ARGH).
If you might recall, the rules are run in a conf file, not in .htaccess. Where did I mess up now? I seem to not understand the output rules, or the items processed and what they return, of course I don’t think I have ever seen anybody explain that clearly. What is the rule to expect on the output of a rewrite, paths or url’s? And if path’s, how does one work with dynamic pages… Just a sec while I smash my face against my keyboard… ah thats better…
It’s fairly simple actually. You can either specify a complete URL, or a path (see http://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewriterule). What you did is specify part of the URL, but you omitted the scheme ([noparse]http://)[/noparse] !
That causes Apache to look for a directory by the name of your domain instead of treating the substitution as a URL.
Finally, the struggle between dark and light, good and evil, and batman vs. the joker is over. The rules are mastered, the urls are re-wrotten, and the cows have come home to roost.
Thanks you guys, couldn’t have done it without you. And, I’d like to thank my mother, my wife, my producer, bill, his nephew franklin, the deputy dawg, bugs bunny, michael jordon, and, oh, can’t forget my little dog toto, too. oh, lets see, the american public, superman, einstein, uh…