Trouble Rewriting From Sub-Domain

I’m trying to create a portable rewrite (Apache) for any time a request is made to a sub-domain without HTTPS and/or without a trailing slash (to enforce no trailing slash for non-directories). I have the following rules in place:

# Redirect any non-directory subdomain req w/ trailing slash to HTTPS URL w/out trailing slash
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteCond %{REQUEST_URI} (.*)/$ [NC]
RewriteCond %{HTTP_HOST} ^([a-z0-9]+\.[a-z0-9-.]{2,}\.[a-z]{2,6})$ [NC]
RewriteRule ^(.*)/$ https://%1/$1 [R=301,L]

# Redirect any non-directory and non-HTTPS req w/out trailing slash to HTTPS
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteCond %{HTTPS} !On
RewriteCond %{HTTP_HOST} ^([a-z0-9]+\.[a-z0-9-.]{2,}\.[a-z]{2,6})$ [NC]
RewriteRule ^(.*) https://%1/$1 [R=301,L]

When there is a request to http://sub.example.com/aaa/ I would like it to be redirected to https://sub.example.com/aaa, but it ends up being redirected to https://sub.example.com/sub/aaa.

Does anyone know how I could change the above rules to achieve what I’m looking for or possibly change some other Apache setting maybe? Any help would be greatly appreciated. Sorry if this is something that could be found with search. I’ve tried, I’m just not sure exactly what to even search for I guess.

1 Like

jj4,

What Apache is doing is exactly what you directed with your code.

First, bad form to remove a trailing / on a directory (some Apache installs may be configured to FORCE the trailing / … which is the proper thing to do, IMHO).

Breaking your TWO requirements down, you want to delete a trailing / on a directory and you want to force HTTPS on subdomain requests (file or directory).

The first is easily done using your first RewriteCond and your RewriteRule WITHOUT the https://%1/. You’re only making things very complex with the intervening RewriteCond statements which are NOT required.

The second is also easily done (in the subdomain’s .htaccess) with the first and third RewriteCond statements and the RewriteRule (using only https://%{HTTP_HOST}/$1).

Your flags are fine.

To prevent overthinking a problem, specify exactly what you want to do IN WORDS. From that point, the coding becomes trivial.

Regards

DK

Thanks for your reply, I appreciate you taking the time. Think I might have been a little unclear with a few things. I’m not trying to remove trailing slashes from directories. The https://sub.example.com/aaa/ example was meant as a page request using a “pretty” url. The code I pasted was also just a part of what I was trying to accomplish, because this was the specific part I was having an issue with. I didn’t know, and still don’t know, why RewriteRule’s include the name of the subdirectory where the subdomain content is stored at the beginning of the request string that it matches its regex against, while RewriteCond %{REQUEST_URI} does not. But I did find a workaround for it by not using captured groups from the subdomain RewriteRules. I also understand that I’m making this a little more complicated than it needs to be, but I’m hoping the effort ends with a simple code snippet or template that I can just add to a .htaccess or .conf file for any site and have all URLs redirected to www url if primary domain requested, to https if it’s off (be it www or subdomain), and to remove trailing slash for file or pretty url requests and to enforce trailing slash for directories. And to do all of this without multiple redirects. I have been working on this more since my original post and the following is what I’ve arrived at, which seems to be working fine, though I need to test it more:

# Enforce no trailing slash, www, and https for non-directories or homepage
RewriteCond %{REQUEST_FILENAME} !-d [OR]
RewriteCond %{REQUEST_URI} ^/$
RewriteCond %{HTTPS} !on [OR]
RewriteCond %{HTTP_HOST} !^www\. [OR]
RewriteCond %{REQUEST_URI} .+/$
RewriteCond %{HTTP_HOST} ^www\.(.*)|(^[a-z0-9-]{2,}\.[a-z]{2,6})$ [NC]
RewriteRule (.*)/$|(.*) https://www.%1%2/$1$2 [R=301,L]

# Enforce trailing slash, www, and https for directories
RewriteCond %{REQUEST_FILENAME} -d
RewriteCond %{REQUEST_URI} !^/$
RewriteCond %{HTTPS} !on [OR]
RewriteCond %{HTTP_HOST} !^www\. [OR]
RewriteCond %{REQUEST_URI} !/$
RewriteCond %{HTTP_HOST} ^www\.(.*)|(^[a-z0-9-]{2,}\.[a-z]{2,6})$ [NC]
RewriteRule (.*)/$|(.*) https://www.%1%2/$1$2/ [R=301,L]

# Enforce no trailing slash and https for subdomain reqs to non-directories or homepage
RewriteCond %{REQUEST_FILENAME} !-d [OR]
RewriteCond %{REQUEST_URI} ^/$
RewriteCond %{HTTPS} !on [OR]
RewriteCond %{REQUEST_URI} .+/$
RewriteCond %{HTTP_HOST} ^(?!www\.)[a-z0-9-]+\.[a-z0-9-.]{2,}\.[a-z]{2,6}$ [NC]
RewriteCond %{REQUEST_URI} (.*)/$|(.*)
RewriteRule .* https://%{HTTP_HOST}%1%2 [R=301,L]

# Enforce trailing slash and https for subdomain reqs to directories
RewriteCond %{REQUEST_FILENAME} -d
RewriteCond %{REQUEST_URI} !^/$
RewriteCond %{HTTPS} !on [OR]
RewriteCond %{REQUEST_URI} !/$
RewriteCond %{HTTP_HOST} ^(?!www\.)[a-z0-9-]+\.[a-z0-9-.]{2,}\.[a-z]{2,6}$ [NC]
RewriteCond %{REQUEST_URI} (.*)/$|(.*)
RewriteRule .* https://%{HTTP_HOST}%1%2 [R=301,L]

I’m fairly new to this Apache configuration stuff, so if the direction I’m going in here is totally wrongheaded, let me know. Also, let me know if you see any holes in my logic / rules.

One other thing I was trying to search for was whether it is somehow possible to save or preserve capture groups through multiple RewriteCond’s? Just mentioning because you seem knowledgeable in this area.

Again, thanks for taking the time to make your reply, it’s most certainly appreciated!

jj4,

I’m glad you’re not trying to overrule the basics of URLs (directories followed by /) so let’s get into things.

Before I get into your code, I need to repeat my request that you look at the tutorial at http://dk.co.nz/seo and especially the examples (they solve typical problems and the code is explained).

Then I must stress that you need to specify the objectives of your redirections:

  1. Remove trailing slashes from non-directory requests

  2. Force trailing / for directories

  3. Force www

  4. Force https for files and directories (all)

IMHO, your mod_rewrite section titles, while helpful, were overly and unnecessarily complex which translated to overly and unnecessarily complex code.

I would handle the trailing slash problems first then the www and finally the https. My code would be:

RewriteEngine on

 # Strip trailing slash on files
 # If not a directory, remove the trailing /
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.+)/$ $1 [R=301,L]

 # Enforce trailing slash on directories
 # If a directory and no trailing /, add a trailing /
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^(.+[^/])$ $1/ [R=301,L]

 # Force www on domain
 # If the {HTTP_HOST} does not start with www., add it via redirection
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteRule .? https://www.%{HTTP_HOST}/%{REQUEST_URI} [R=301,L]

 # Force https
 # Check for https port (more reliable - unless the port's been reassigned)
RewriteCond %{SERVER_PORT} !^443$
 # OR
 # RewriteCond %{HTTPS} !on
RewriteRule .? https://%{HTTP_HOST}/%{REQUEST_URI} [R=301,L]

Notes:

  1. ORDER IS IMPORTANT! The first two RewriteRule sets are to relative links so they could go before or after the last two (my preference is before because I don’t want to redirect a secure page unnecessarily) I would keep the third before the fouth for the same reason.

  2. KISS! As mentioned before, you were making things overly complex looking at EACH of the multiple cases for the trivial redirections.

  3. I included the R=301 flag to force Apache to show each redirection (it will happen too fast to see anything but the last step - the first three are not really useful but a good troubleshooting technique if the entire list is not processed correctly so remove when you’re ready). It will not only display the redirected URL but advise search engines to update (an important thing for webmasters).

  4. The third rule’s condition is checking for NOT starting with www. (www{dot}) and does NOT care about the {HTTP_HOST} as it simply uses what is already there (in Apache’s variables) for the redirection.

  5. The regex, .?, will match any URI and effect the redirection (if the RewriteCond statements return TRUE). It’s just a handy shortcut.

No, you’re not “wrongheaded,” just overly reliant on the :fire: EVERYTHING :fire: (or NOTHING) atom. For you, it was capturing EVERYTHING including the unwanted parts of the variable you were testing.

No, you cannot save “capture groups” but you can create your own grouping of RewriteRule sets using the Skip flag (which you should have learned about in the tutorial) to avoid retesting a complex set of variables.

You are welcome. I had learned a great deal about mod_rewrite from a former Team Leader and followed up with many years as a TL myself. My tutorial was my attempt to avoid Carpal Tunnel Syndrome from repeating responses. However, I’m still here to help spread the knowledge as it is my belief that we all should pass the knowledge along. I hope you (and others) adopt that same philosophy.

Regards,

DK

Thanks again for your help and sorry for my slightly delayed response. Work and life and such. I definitely like your suggested code. I did have a couple questions about it though. Please forgive me if the questions are covered in the tutorial you suggested. I did read through a lot of it, but don’t think I came across anything that answered what I’m going to ask.

First question is about the first RewriteRule in your code suggestion. As you said, it uses a relative URL to redirect to. Doesn’t this make it so there could be multiple redirects, such as in the case when http://example.com/notdirectory/ is requested? Seems to me it would first be redirected to http://example.com/notdirectory and then to https://example.com/notdirectory by subsequent rules. Maybe this isn’t such a big deal and I’m being too nitpicky? I know you also mentioned how you included the R=301 flags to each RewriteRule for testing and demonstration purposes only, and that in actuality you’d only apply it to the last RewriteRule. Does that mean that in my double rewrite scenario, the first rewrite would be instantaneous, and therefor inconsequential? Either way, wouldn’t using an absolute https URL eliminate the need for 2 redirects? I’m sorry if I’m being dense on this, but I also don’t understand why you would only need to apply the R=301 flag to the final RewriteRule, especially since that rule will not always even be applied (as it wouldn’t in my double rewrite scenario, where a previous rule rewrites the request to https, and so that final rule would not be applied). In the cases that that final RewriteRule with the R=301 flag is not applied, wouldn’t the search engines not be advised to update?

My other main question has to do with your 3rd RewriteRule, which would redirect https://sub.example.com to https://www.sub.example.com, where I’d prefer it just remain https://sub.example.com. Is one or the other of these preferable for some reason I’m not aware of? If not, I think some of my original code might be necessary to distinguish redirects for www/primary domains from subdomains.

One other little question. Is .? different from just a period in some way?

I also used the Skip flag, as you suggested, to cut out a lot of the code repetition, and the following is what I currently have going (I know I still have too much (.*) going on and I plan to fix that):

# First 3 RewriteRules for non-trailing slash requests
RewriteCond %{REQUEST_URI} .+/$
RewriteRule .? - [S=3]

# Next 2 RewriteRules for directories
RewriteCond %{REQUEST_FILENAME} !-d [OR]
RewriteCond %{REQUEST_URI} ^/$
RewriteRule .? - [S=2]

# Enforce trailing /, https, and www for directories
RewriteCond %{HTTP_HOST} ^www\.(.*)|(^[a-z0-9-]{2,}\.[a-z]{2,6})$ [NC]
RewriteRule .? https://www.%1%2%{REQUEST_URI}/ [R=301,L]

# Enforce trailing / and https for subdomain directories
RewriteCond %{HTTP_HOST} ^(?!www\.)[a-z0-9-]+\.[a-z0-9-.]{2,}\.[a-z]{2,6}$ [NC]
RewriteCond %{REQUEST_URI} (.*)/$|(.*)
RewriteRule .? https://%{HTTP_HOST}%1%2/ [R=301,L]

# Next 2 RewriteRules for non-directories
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule .? - [S=2]

# Enforce no trailing /, https, and www for non-directories
RewriteCond %{HTTP_HOST} ^www\.(.*)|(^[a-z0-9-]{2,}\.[a-z]{2,6})$ [NC]
RewriteRule (.*)/$ https://www.%1%2/$1 [R=301,L]

# Enforce no trailing / and https for subdomain directories
RewriteCond %{HTTP_HOST} ^(?!www\.)[a-z0-9-]+\.[a-z0-9-.]{2,}\.[a-z]{2,6}$ [NC]
RewriteCond %{REQUEST_URI} (.*)/$
RewriteRule .? https://%{HTTP_HOST}%1 [R=301,L]

# Enforce www (non-subdomains) and https
RewriteCond %{HTTP_HOST} ^([a-z0-9-]{2,}\.[a-z]{2,6})$ [NC]
RewriteRule .? https://www.%1%{REQUEST_URI} [R=301,L]

# Enforce https
RewriteCond %{SERVER_PORT} !^443$
RewriteRule .? https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]

I definitely agree with you about sharing our knowledge. If I ever get to the point where I feel like I wouldn’t be giving poor mod_rewrite advice, I will definitely follow the same philosophy. Thanks again!

Hi jj4,

All questions are good … hopefully the answers will be, too! :wink: You also know who to blame if you don’t understand something in the tutorial … and where to go for answers.

There is the possibility of multiple redirects … because any redirect (to the same location) will cause the mod_rewrite to be RE-executed (from the start) until there are no further redirections (that is why it’s important NOT to write “loopy code”).

It’s NOT a big deal that mod_rewrite handles its code in a serial fashion because, as stated before, ORDER IS IMPORTANT. All I was doing was taking care of the redirections that would not have to be repeated before getting into the external redirections (http or https).

You’re not being dense. The question is good but shows that you’re not thinking like a computer (logically). I separated my first and second RewriteRules because they handled redirections differently (one adding and one removing a trailing / … POTENTIALLY). I was inclined to combine the third and fourth but did not want to mess with the www and https at the same time (the two RewriteCond statements do not combine logically to arrive at a single solution).

Okay, that was one gotcha for you! Okay, three. Leave the R=301 flags in all four RewriteRules because you may only need one (or two or three) redirections and not match the fourth RewriteRule.

Subdomains: I had interpreted your “specification” to require a leading www. My error. More information is required, then, for the code: How many subdomains are you using (including www)? If you can “list” them in your code (rather than use “catch all” code), and your domain name, yes, you can easily add www UNLESS the subdomain is {in the “list”} using

RewriteCond %{HTTP_HOST}. !^((www|sub1|sub2|sub3)\.)example\.com [NC]

Here I need to add that hackers attempt to break your code to find a way inside your website. The more generic your code, the more opportunities for the hacker to find some combination which can break your code and allow access to your website. In other words, be as specific as you can … and a “list” does that perfectly (not to mention using machine cycles to redirect garbage)!

Personally, I prefer subdomains and domains without www. but that’s just me. That said, if you’re using a secure server, the cert is normally for either the www or non-www as you requested it from the CA so be sure that your mod_rewrite matches!

.? is VERY different than . (dot) as the ? makes it optional. If someone requests just your domain (example.com, www.example.com without without the trailing /), there is NO character for the dot metacharacter to match and {dot} would fail and the RewriteRule not executed (despite all the correct conditions were matched in the RewriteCond statements). In other words, .? will ALWAYS match.

OMG! Is it logical to you to use NINE RewriteRules instead of FOUR? Remember my first admonition: SPECIFY your requirements for the redirections in words BEFORE you start coding. I don’t think that can be said of:

  1. Skip three rules if there is a trailing /

  2. If not a directory OR {no match, ^/$ is no longer valid as the / cannot be matched - that went away with Apache 1.x), skip two rules

  3. If directory (from #2), redirect to secure server without changing the domain but adding a trailing slash

  4. Same as #3 but for requests without www.

  5. If directory, skip twice

  6. Strip trailing / but add www to domain

  7. If not www, strip {REQUEST_URI} and send to secure server

  8. Add www for domain

  9. If not secure server, send to secure server

How convoluted was that!?! IMHO, back to the basics: Handle the directories and files (trailing /'s) then handle the www (with the “list” code rather than “catchall”) and then enforce the secure server (Google does not like it when you redirect secure server requests). Does that not sound more logical than a lot of if … then … else statements and structure?

Don’t worry, you’ll get there! I’m not trying to abuse you, just trying to have you “think” as logically as the mod_rewrite parser does (sequentially) and handling redirections in a logical, orderly manner.

Regards,

DK

Thanks again for your response! A couple examples to confirm something: Take my http://example.com/notdirectory/ example request again. Under your suggested code from before, mod_rewrite code would be executed once, then executed again using http://example.com/notdirectory instead, then the page would be redirected to https://www.example.com/notdirectory, then rewrite code executed again and would no longer match any RewriteRules. Is that correct?

And one more just to clarify. If we did the same as above, but for the trailing slash RewriteRule we had used an external URL with http (not https just for example’s sake) instead of the relative one, would it go: Mod_rewrite executed once, redirected to external http page with no trailing slash, Mod_rewrite executed again, page externally redirected again to https version of page, then rewrite code executed again and would no longer match any RewriteRules?

Assuming the above is all true, while internal redirects are not much of an issue, should multiple external redirects be avoided if possible for performance/pageload speed reasons, or are those not such a big deal either?

Definitely get the need to account for hackers as well. Along with this code I’m working on here, I am also integrating a somewhat customized form of https://perishablepress.com/6g/ to my conf/htaccess template to try to limit potentially harmful/wasteful requests.

I actually also would prefer domains without www, but I just go with that so I can set cookies to www only and not serve them with every request to subdomains as well.

Duh moment with the .? question. Guess it was late and I’ve been using .* too much and I kind of blended its meaning together with the period.

You’re definitely right that logic in my last posted code is convoluted, but it was mainly because I was trying to get everything to work without need to redirect more than once. In my mind I was okay with the complexity if it meant only one redirect no matter the request. But all of what you’ve explained makes sense and I’ll definitely go with your code. And I definitely don’t take any of what you’ve said as abuse. I appreciate all the information and thanks again.

jj4,

Almost. In my example, I didn’t try to adjust the https until the fourth RewriteRule. Otherwise, yes, if a redirection is made, the entire mod_rewrite starts from the beginning to access every directive and, if necessary, redirect again. If not, the redirected request is then fetched by Apache.

Actually, mod_rewrite will restart and process your RewriteRules until it finds a redirection then restart again and loop until no further redirections are made.

I tried to workout whether RELATIVE/ABSOLUTE made a difference. It certainly would if the absolute redirection sent the request to another server (or to another .htaccess on your server). Relative redirections must also cause a restart but not necessarily an immediate one (I’m fuzzy on this specific point … check your log file as every step is logged if you’re using log level 9). I doubt that the difference between relative and absolute redirections is measurable IF the same directory level is being accessed in the redirections.

Personally, I prefer internal redirections UNLESS I’m changing the domain or protocol (less typing and keeps my head from spinning :grin: ).

Gudonya! ALWAYS keep security in mind when coding (and managing your server).

Hmmm, I thought cookies went to the domain name (www may be differentiated from non-www) but I choose www’d or not and stay with it throughout the website. As I said before, though, if you’ve requested a secure server certificate with www, you’re stuck. My mindset is non-www so I won’t try changing mid-stream!

Duh? No, that was a good question. In the Apache 1 days, you HAD to specify the / between the {HTTP_HOST} and {REQUEST_URI}. I’ve been told that it’s optional now (finally) with Apache 2 but it had not started as optional so, because it is NOT part of the {REQUEST_URI}, I keep away from it.

To use only one redirection, you’d have to evaluate -d, trailing /, www vs subdomain and https for 2^4 or 16 different blocks of very convoluted (well … repetitive) code. IMHO, best to let mod_rewrite work the way it’s supposed to and deal with each “situation” in a logical order. 'Glad you did get the message, though.

Ready to teach your own course in mod_rewrite yet … or just help others? I think you’ve got a good handle on the coding end if it and, hopefully, I’ve helped you with the logic at the beginning (before coding).

Regards,

DK

I was of the mind that any cookie set to example.com would also be sent with requests to any sub.example.com com as well, though I could be wrong. I found a discussion on this at http://stackoverflow.com/questions/18492576/share-cookie-between-subdomain-and-domain if you’re interested.

And yes, I’m feeling a lot better about my mod_rewrite knowledge than before, thanks again for all the help. I’ll certainly take any opportunity I get to help others having the same issues.

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.