Basic rewrite http https www non-www

here goes-

I have a Zen Cart store on Linux. I have noticed on major e-commerce sites that certain rewrites exist for several simple behaviors. Here it is in a nutshell, followed by code I’ve put together that solves all the behaviors, but causes mixed encryption on secure pages, so I lose the padlock.

Goals-

  1. http to https if for whatever reason the ‘s’ is missing from the request (actually need secure page).
  2. https to http if for whatever reason the ‘s’ is included in the request (actually do not need secure page).
  3. addition of www wherever it is missing, for both protocols.
  4. no loss of padlock due to mixed encryption resulting from rewrite rules, specifically second rule below.

So, that is really all there is to the goals. Below is the code, but first a brief explanation of why the padlock – possibly due to mixed encryption – is seemingly getting lost.

On the second rule below, all is well with the URL in the browser address bar after testing the behaviors, but apparently some page elements are making their way through the second rule, and thus getting http instead of https, and thus causing mixed encryption leading to loss of padlock in IE. In FF, it is a warning padlock. This is my guess at this point about why there is an encryption problem with the second rule. I have not been able to determine if this is exactly true (mixed encryption), or which elements might be causing it, but I’m pretty sure ‘img src’ references are properly calling with relative references in the site directories, so maybe images are not the problem. There are some scripts on the secure pages, but I have not been able to determine if they are going through the second rule and thus causing the problem. I have pinned down that the query_string filter in the second rule seems to be ‘letting something through,’ which is why I figured it must be page elements.

One final note – it may be that major commerce sites use an entirely different method of achieving the behaviors outlined above, but I thought I would be able to do this all in the site’s root htaccess. After three days, this is where I am at, with all behaviors working but loss of padlock, possibly due to mixed encryption resulting from the second rule. So, here goes, and all help is, of course, profoundly appreciated. PS: I have to use QUERY_STRING for filtering because Zen Cart is php.

RewriteEngine On 
#
# Do not apply following rules to admin area of Zen Cart. 
RewriteRule ^(zc_admin) - [L]
#
# Redirect to https (port 443) and/or add www, when needed, for all secure pages in QUERY_STRING list.
RewriteCond %{SERVER_PORT} !^443$ [OR]
RewriteCond %{HTTP_HOST} !^(www\\.example\\.com)?$
RewriteCond %{QUERY_STRING} (log(in|off)|account|checkout|contact|address|time_out|password)
RewriteRule ^(.*)$ https://www.example.com/$1 [R=301,L] 
#
# Redirect to http (port 80) and/or add www, when needed, for all pages other than those in QUERY_STRING list.
RewriteCond %{SERVER_PORT} ^443$ [OR]
RewriteCond %{HTTP_HOST} !^(www\\.example\\.com)?$
RewriteCond %{QUERY_STRING} !(log(in|off)|account|checkout|contact|address|time_out|password)
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
#

Thank you anyone for checking this code and/or offering advice. Hopefully, it might be something simple to fix it, as I’m not an experienced coder. :slight_smile:

Jim

Hi Jim,

Welcome to the SitePoint forums :wave:

Let’s walk through the .htaccess shall we?


RewriteRule ^(zc_admin) - [L]

That does absolutely nothing it all, more specifically it doesn’t skip the following rules for URLs that start with zc_admin as you seem to expect.
If you want to skip the following 2 rules after this one you should use the Skip flag:


RewriteRule ^(zc_admin) - [L[COLOR="RoyalBlue"],S=2[/COLOR]]


RewriteCond %{SERVER_PORT} !^443$ [OR]
RewriteCond %{HTTP_HOST} !^(www\\.example\\.com)?$
RewriteCond %{QUERY_STRING} (log(in|off)|account|checkout|contact|address|time_out|password)
RewriteRule ^(.*)$ https://www.example.com/$1 [R=301,L]

RewriteCond %{SERVER_PORT} ^443$ [OR]
RewriteCond %{HTTP_HOST} !^(www\\.example\\.com)?$
RewriteCond %{QUERY_STRING} !(log(in|off)|account|checkout|contact|address|time_out|password)
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]

That all seems fine to me, except the rules could be made a little bit more efficient like so


RewriteRule .? http://www.example.com%{REQUEST_URI} [R=301,L]

Same goes for https of course

Note that the way you ordered the blocks will enforce https for non-www requests first, and later remove it if finds it isn’t needed, causing an extra redirect. If you reverse the blocks it goes the other way around. You may be better off by separating the force www out of the code and redirect to http://www if the initial request was for http:// and to https://www if the initial request was for https. To do that you would change your code like so:


RewriteEngine On
RewriteCond s%{HTTPS} ^((s)on|s.*)$ [NC]
RewriteCond %{HTTP_HOST} !^www\\.example\\.com$ [NC]
RewriteRule .? http%2://www.example.com%{REQUEST_URI} [L,R=301]

RewriteCond %{SERVER_PORT} !^443$ [OR]
RewriteCond %{QUERY_STRING} (log(in|off)|account|checkout|contact|address|time_out|password)
RewriteRule .? https://www.example.com/%{REQUEST_URI} [R=301,L]

RewriteCond %{SERVER_PORT} ^443$ [OR]
RewriteCond %{QUERY_STRING} !(log(in|off)|account|checkout|contact|address|time_out|password)
RewriteRule .? http://www.example.com%{REQUEST_URI} [R=301,L]

As for the mixed security, the best way to tackle that is to install the firebug plugin and go to the “Net” tab when you’ve installed it and see which requests are made to http instead of https
Trying to solve this from a theoretical point of view is just too tedious IMO :slight_smile:

Hi Jim,

It’s 1:30AM so I won’t take long with this tonight - just a few comments on your goals and code:

[QUOTE=finlanderid;4750629]
Goals-

  1. http to https if for whatever reason the ‘s’ is missing from the request (actually need secure page).

ZenCart takes care of that for you so it shouldn’t be necessary. If you believe it is, use the “trick” code to enforce HTTPS which is linked in my signature - but ONLY apply it to PHP scripts, not CSS, JS, images, etc.

  1. https to http if for whatever reason the ‘s’ is included in the request (actually do not need secure page).

Ditto

  1. addition of www wherever it is missing, for both protocols.

Ditto the “takes care of that for you” but, if your secure server is www’d, then, by all means, check the secure server status, then the domain for www and send non-www’d domains to the www’d version.

  1. no loss of padlock due to mixed encryption resulting from rewrite rules, specifically second rule below.

If you’re mucking around with https/http, then you’re causing problems for ZenCart’s handling. If you insist, though, be sure you’re NOT redirecting css, js, images, et al as they’re requested with relative links which will retain the {HTTPS} status.

{snip}

Yes, it’s fine to do all this in the document root’s .htaccess.

RewriteEngine On 
#
# Do not apply following rules to admin area of Zen Cart. 
RewriteRule ^(zc_admin) - [[B]PT[/B]]
#
# Redirect to https (port 443) and/or add www, when needed, for all secure pages in QUERY_STRING list.
RewriteCond %{SERVER_PORT} !^443$[COLOR="Red"] [OR][/COLOR] # NO!
RewriteCond %{HTTP_HOST} !^[COLOR="Red"]([/COLOR]www\\.example\\.com[COLOR="Red"])?[/COLOR]$
RewriteCond %{QUERY_STRING} (log(in|off)|account|checkout|contact|address|time_out|password)
RewriteRule ^(.*)$ https://www.example.com/$1 [R=301,L]
[indent]You KNOW the link is to index.php so just say that!  Unfortunately, :kaioken: (.*) :kaioken: will also redirect .css, .js, .yaddayadda so use RewriteCond %{REQUEST_FILENAME} !-f (and !-d) to prevent redirecting your supporting files.[/indent] 
#
# Redirect to http (port 80) and/or add www, when needed, for all pages other than those in QUERY_STRING list.
[indent]WHY?  ZC does that for you.[/indent]
RewriteCond %{SERVER_PORT} ^443$ [OR]
RewriteCond %{HTTP_HOST} !^(www\\.)?example\\.com$
RewriteCond %{QUERY_STRING} !(log(in|off)|account|checkout|contact|address|time_out|password)
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
#

G’nite!

Regards,

DK

Thank you, both, ScallioXTX and dklynn.

I should have clarified situations that I want the behaviors to occur, because dklynn quickly pointed out that ZC does most of this automatically if correctly configured. I wanted these behaviors to occur if a site visitor goes up to the address bar and removes the ‘s’ when it’s supposed to be there or puts it in when it’s not supposed to be there, or removes the www. Yes, I installed ZC to have the www prefix, so in moving about the site the www is always there, but what I was trying to accomplish with the behaviors was to mimic what I have seen at major e-commerce sites – their code handles this. I realize that it’s possible or probable that they use a completely different method – at least with available Apache methods and putting .htaccess in subs etc. – but once I got so close to being able to do it with the root htaccess, I didn’t want to give up just because of the mixed encryption problem.

So, writing rewrite rules that would prevent any tampering of the URL, I guess, may also have been a goal – I just wanted to see if I could do it. If I can solve the mixed encryption problem, then all is well.

okay, I am now going to work on all your suggestions. Thank you, again.

btw, ScallioXTX, I just wanted to mention that the rule to stop rules on the admin directory did work. I wasn’t able to access admin because of the other rules, and when I put that very first rule in to stop processing, then the admin area was fine. I read something once about the [L] being meaningless for redirects because it was designed to only work on rewrites. The very first rule is a rewrite (the second and third are redirects and in my testing I did notice that the [L] flag does not stop processing when processing is moving through them). Since the [L] flag appears to work on the very first rule, it does appear that once zc_admin is in the URL, no rule processing is occurring, as evidenced by being able to get to and move about admin after adding that very first rule. I just wanted to mention this in case I am missing something, or to be helpful by pointing something out that people might be overlooking about the [L] flag. To be clear, however, the initial problem I was having with admin was not being able to login on to admin – whenever I tried to login, the rules may have been causing a loop or something, and because my id and password were needing to get moved in right away, so to speak, then hitting ‘submit’ was just throwing the login page back up at me. After I added the first rule, I was able to login. I don’t know what behavior would have occurred in admin, with or without the first rule, after logging in, but since lots of ‘submittal’ types of things get done in admin, the redirects still may have caused a problem. They still might, I haven’t tested, but I will also add your SKIP flag and that would seemingly take care of things, too. :slight_smile:

oh, ScallioXTX, also…

Note that the way you ordered the blocks will enforce https for non-www requests first, and later remove it if finds it isn’t needed, causing an extra redirect.

I understand your point, but when https is forced for non-www requests in the first rule, that is only for the specific pages named in the query_string of the first rule (specific pages requested as http and also don’t have www), so the second rule would not then remove https, because the second rule does not apply to those specific pages. The second rule is meant to say “all other pages.” (Incidentally, this issue about the second rule processing leads into the mixed encryption problem, because page elements on the specific secure pages appear to get ran through the second rule, because those page elements don’t get excluded from the second rule simply because they don’t get flagged for exclusion, i.e., a page element on the login page does not have ‘login’ in the query_string, so it gets ran through the second rule and loses its https and, thus, the loss of padlock problem. At least this seems to be what is happening. I’m not a coder and so am inferring many things.)

okay, off to work! I will post later today with results. The major problem at this point is the mixed encryption causing loss of padlock on secure pages, and even if I make the redirects only for php, then if there is any php being called as a page element, then … well, game over? I’ll try to avoid that ending!

dklynn,

I’m confused about your red highlight in the quote box you included in your answer. If you mean to say that the [OR] flag should not be used, then I need to clarify why it’s there.

The [OR] flag in the first rule causes the following operation (I don’t mean to sound condescending – just explaining why I wrote the code this way :)):

‘if request comes in for http and is one of these specific pages, then redirect to https://www

OR

‘if request comes in under https, but does not have www and is one of these specific pages, then redirect to https://www

That was the trick for ‘capturing’ the four various behaviors in two rules. If the [OR] flag is removed then the trick is gone and for a page to get https://www, it would have to to come in under http and have to be missing www. That would work for those types of pages, but would not add the ‘s’ for a secure page when www is already present, and would not add the www for a secure page that comes in under http. So, that’s why the [OR] is an integral part of how I wrote the code.

I wanted to address your comment, because you took the time to make it, so thought I should give a proper discussion of the reason I included the [OR] flag in both rules – it causes capture of all incidents that need redirection.

Okay guys … First, thank you! I wish I had found this forum two days ago, but two days is better than 20 or 200…

I have two versions of the code that work perfectly, with one very minor inconvenience, which you will find as a bonus question at the end of this post.

The first version of the code places ScallioXTX’s canonical rule at the end (I’ve read the correct order is to put canonicalisation last; don’t know if it’s true or not). That canonical is quite elegant, btw, for handling http and https in a single leap.

The second version incorporates the canonical condition with conditions for changing http to https and vice-versa. This was how I had the code initially, and it seems to work fine, now that we have fixed the mixed encryption problem!

Ah, the mixed encryption problem. I located some of dklynn’s writings on the topic, and so I simply added a condition that this http(s) to http(s) jazz, would only be done for request_uri that had strings php or html. Page elements, whichever ones that were causing the mixed encryption, are not now getting changed to http on secure pages. It really seems to be that simple. Trouble with coding for beginners, like me, is the simplest things are completely foreign until they are discovered.

Now to the code:

#
Options +FollowSymLinks
RewriteEngine On 
#
# Do not apply following rules to admin area of Zen Cart.
RewriteRule ^(zc_admin) - [L,S=3]
#
# http to https, only when needed
# target php|html only, to avoid causing mixed encryption pages
RewriteCond %{SERVER_PORT} !^443$ [NC]
RewriteCond %{QUERY_STRING} (log(in|off)|account|checkout|contact|address|time_out|password) [NC]
RewriteCond %{REQUEST_URI} \\.(php|html) [NC]
RewriteRule .? https://www.example.com%{REQUEST_URI} [L,R=301]
#
# https to http, only when needed
# target php|html only, to avoid causing mixed encryption pages
RewriteCond %{SERVER_PORT} ^443$ [NC]
RewriteCond %{QUERY_STRING} !(log(in|off)|account|checkout|contact|address|time_out|password) [NC]
RewriteCond %{REQUEST_URI} \\.(php|html) [NC]
RewriteRule .? http://www.example.com%{REQUEST_URI} [L,R=301]
#
# canonical addition of www, both protocols, only when needed
RewriteCond s%{HTTPS} ^((s)on|s.*)$ [NC]
RewriteCond %{HTTP_HOST} !^(www\\.example\\.com)?$ [NC]
RewriteRule .? http%2://www.example.com%{REQUEST_URI} [L,R=301]
#

-or-

#
Options +FollowSymLinks
RewriteEngine On 
#
# Do not apply following rules to admin area of Zen Cart.
RewriteRule ^(zc_admin) - [L,S=2]
#
# http to https and/or add www, all only when needed
# target php|html only, to avoid causing mixed encryption pages
RewriteCond %{SERVER_PORT} !^443$ [OR]
RewriteCond %{HTTP_HOST} !^(www\\.example\\.com)?$
RewriteCond %{QUERY_STRING} (log(in|off)|account|checkout|contact|address|time_out|password)
RewriteCond %{REQUEST_URI} \\.(php|html)
RewriteRule .? https://www.example.com%{REQUEST_URI} [L,R=301]
#
# https to http and/or add www, all only when needed
# target php|html only, to avoid causing mixed encryption pages
RewriteCond %{SERVER_PORT} ^443$ [OR]
RewriteCond %{HTTP_HOST} !^(www\\.example\\.com)?$
RewriteCond %{QUERY_STRING} !(log(in|off)|account|checkout|contact|address|time_out|password)
RewriteCond %{REQUEST_URI} \\.(php|html)
RewriteRule .? http://www.example.com%{REQUEST_URI} [L,R=301]
#

ScallioXTX, I wanted to keep the regex ? (preceding optional) character on the http_host condition, because there is one way that it can be useful. If a request comes in with exactly the right www host name (which would result in Cond false and stop the redirect), BUT has a trailing port number, I think the ? will cause a (negation) MATCH in that instance, allowing Cond true and letting that request get through the rule and get a redirect, which may be desirable for trailing port numbers on incoming requests, even when the incoming request is perfect on the host name. That, I think, would be the only reason to leave the ?, so I left it in.

And the bonus question: the http(s) rules won’t work on the home page of my site, because it’s not allowed in the Cond (obviously does not contain php or html). Is there a simple way of allowing it, without breaking any of the code that has successfully been completed so far?

Once again, I could not have done this, with my current coding ability, had I not found this site and received the responses given by ScallioXTX and dklynn. I hope that this work may be of help to someone else.

Jim

Jim,

I’m really impressed with all the stuff you got done! Kudos!

First, about the last flag. The description that the Apache manual gives is vague to say the least. What you need to know is that rewriting is a process done in multiple rounds. So first a request is made, and apache will process the RewriteRules one by one. When a rule matches, it will check of the RewriteCond that are grouped with that rule (if any) also match and if they do it will fire the rule.
Without the last flag it will continue to process the next rule, and the one after that, and so on and so forth, until all rules are checked.
Once all the rules are checked and the URL that comes “out” of the .htaccess (in a manner of speaking) is different than the URL that went “in”, a new round is started. Starting at the first RewriteRule, checking that, and it’s conditions etc etc.
Now, if you put the last flag on a rule, and that rule matches, it will not continue to check the other rules in that round, but will continue straight on the next round. Since your rules are mutually exclusive (if rule 2 matches, rule 3 won’t, and vice versa), this is exactly what you want. If rule 2 matches there is no need to check rule 3 in the same round; you’re sure it won’t match anyway so it’s just burning CPU cycles for nothing.
There are instances where you don’t want to use the last flag but yours isn’t one of them :slight_smile:

As for the placement of the code to take of the canonical URL, I don’t really think it matters. I personally prefer to put it at the very start of the .htaccess to make sure that I’ve at least got the canonicalization right before processing all the other rules but it’s a matter of taste I suppose.

The question mark in the following rule:


RewriteCond %{HTTP_HOST} !^(www\\.example\\.com)?$

means: match “www.example.com” zero or one time, which is the same as saying: match “www.example.com” OR don’t match anything at all.
The %{HTTP_HOST} only contains the host name, and not the port number, thats what %{SERVER_PORT} is for :slight_smile:
All in all, I don’t see a reason why you would want a ? in there.

The bonus question. The special thing about the home page is that it doesn’t have a query string, and the request uri is just /
You could try to jump through all sorts of hoops to get that incorporated in the current rules, but I reckon it would be better (and more readable!) if you added a forth RewriteRule to the .htaccess with two RewriteConds (one for the query string, one for the request uri)

btw. The [NC] stand for NoCase, meaning the regex engine will test the regular expressions in a case insensitive manner. Since a port number is a number and there are no such things as “uppercase numbers” and “lowercase numbers” you can drop the [NC] on the RewriteConds that test for the SERVER_PORT
Similarly, if you always use .php and .html (and not .Php, .pHp, .Html, HTML, or any other variant) you can drop the [NC] on those Conds as well.

:slight_smile:

Awesome, thanks, that all makes sense, yeah I got to thinking that the port number isn’t part of the http_host variable, so throwing the ? in there, to try to get a ‘true’ result from the condition when there is a trailing port number, just would not work. An expert on another forum recommended it, but when there is so much code flying around, and some things that to me at least seem to work on an atomic level, I can see that all types of code gets discussed incorrectly, even by the experts.

I get what you said about the [L] flag and everything else. I should be paying for such solid education!

I’ll work on that fourth rule – should not be too much really. Just have request_uri check for ‘/’ like the other rules check for php|html? Sounds simple enough, but with something special like the root directory, I could probably mess it up!

Have a nice night, or day, over there,

Jim

Jim, I’m glad it all made sense.

As for the fourth rule, I’m sure you’ll figure that one out; it’s basically the same as the second and third rules, but with different parameters for the Conds

I probably don’t need to say this, but make sure you change [S=3] to [S=4] when you add the fourth rule.

You have a nice day/night as well! :slight_smile:

Jim,

Okay, you’re just looking at reining in the rogue visitors trying to upset the apple cart. I haven’t tested but I’d think that ZenCart would handle that, too, as part of its configuration, i.e., limiting access to its “secure modules” to HTTPS. I’d be as shocked if it doesn’t as I would be overwhelmed by finding all those “secure modules” within ZenCart’s structure (your list of query string keys should include most if not all of those).

I used to preach that the Last flag was the equivalent of ; or } in PHP coding. It’s not! It tells Apache that you’re done with the current processing of ALL RewriteRules to allow Apache to go back to restarting the request’s process with a new URI. Using a PT (pass through) flag at the start would cause a failure to match (and restart) and terminate mod_rewrite processing.

# Redirect to https (port 443) and/or add www, when needed, for all secure pages in QUERY_STRING list.
RewriteCond %{SERVER_PORT} !^443$ [OR]
RewriteCond %{HTTP_HOST} !^[COLOR="Red"]([/COLOR]www\\.example\\.com[COLOR="Red"])?[/COLOR]$
RewriteCond %{QUERY_STRING} (log(in|off)|account|checkout|contact|address|time_out|password)
RewriteRule ^(.*)$ https://www.example.com/$1 [R=301,L]

Okay, I see your point - but I would have forced www for all requests before doing anything else. Sorry, my mind was in that rut. That said, I’d still NOT make the entire {HTTP_HOST} string optional (remove the red bits)!

Regards,

DK