Htaccess rules order and redirect rule

Hi all
I have a couple of HTaccess questions. I have read DKs guide on rewrite rules many times and it is slowly sinking in! but would appreciate some advice if possible as I am still far from an expert,

Is there a preferred order for redirects? I want to a) force https, b) force www, c) remove file extensions - in my head that’s the order to do it but is there a preferred order to do those?

I also need to remove file extentions for .html files, but as this is an existing site I am replacing, I want to redirect the existing .html files in seach engines so customers only ever see the extentionless URL and the search engines update to include that in results, I have had some help on that before but here is my code, are there any glaring errors?

# Redirect to NEW format if page is requested with .html extension
# check there has not been a previous redirect to avoid a redirect loop
RewriteCond %{IS_SUBREQ} false
RewriteRule ^([a-z]+)\.html$ %1 [R=301,L]
# check html file exists and redirect back to usable link
RewriteCond $1\.html -f
RewriteRule ^([a-z]+)$ $1.html [L]

All feedback appreciated, I am trying to build my knowledge on this and have done a lot of reading but it always helps to get feedback from experts who do this all the time as the part to stop the extension showing is still making me a little unsure.

Many Thanks

Matt

Hi,
just looking at my code, should:

RewriteRule ^([a-z]+)\.html$ %1 [R=301,L]

actually be:

RewriteRule ^([a-z]+)\.html$ $1 [R=301,L]

Swopping the %1 for $1?

I am continuing to read on this, but any feedback greatly appreciated.

Also, can this be tested locally? I assume it can as my simpler redirection to remove extension works on my home machine,

Thank,
Matt

I wouldn’t recommend trial-and-error swapping those.
https://httpd.apache.org/docs/current/rewrite/intro.html#regex

Regex Back-Reference Availability

Note - Condition vs, Rule pattern matches

Thank you.

I am trying to learn this - I’ve done lots of reading but clearly need to, and will, do more :slight_smile:

I suggested that change because I thought RewriteRule needs to use $ but I can see from your example above that they can use both $ and % so I will go and read that reference as well.

However, the more I think and read about it (and confuse myself) that change to use $ is correct?

the line:

RewriteRule ^([a-z]+)\.html$ $1 [R=301,L]

should redirect for example www.domain.com/mypage.html to www.domain,com/mypage? because the rule will capture the URL path requested and then use the back reference captured by the pattern to redirect it without the .html extension?

Any other hints appreciated on my code above - clearly getting a little tied up here!
Cheers

Matt

Hi all

So here is my final code - I have read lots on this but it still wont work, any other suggestions would be really appreciated. Thanks, Matt

# Redirect to NEW format if page is requested with .html extension
# check there has not been a previous redirect to avoid a redirect loop
RewriteCond %{IS_SUBREQ} false
RewriteRule ^([a-z]+)\.html$ $1 [R=302,L]
# check html file exists and redirect back to usable link
RewriteCond $1\.html -f
RewriteRule ^([a-z]+)$ $1.html [L]

I’m confused, it looks like

a request for something.html rewrites to something
if something.html does not exist
rewrite a request for something to something.html

I’m unable to understand what you are trying to do from those directives.

In my limited experience, the directives are typically like

a request for something rewrites to something.html
(well, usually php not html these days, anyway …)

I have never seen any intermediate “not a file” conditions, but I guess that could work.

I think you may be understanding what rewrites are doing. That is, AFAIK, they allow visitors to make requests for “something” without needing to request something.html or something/index.html or index?page=something - but - the more verbose requests should still work i.e. it is more about routing than what shows in a visitors address bar.

For search engines, I think you may be looking for canonical URLs

This seems to work:

RewriteEngine On

# Redirect to NEW format if page is requested with .html extension
RewriteCond %{THE_REQUEST} " /([a-z]+)\.html "
RewriteRule ^ /%1 [R=302,L]

# check html file exists and redirect back to usable link
RewriteCond %{DOCUMENT_ROOT}/$1.html -f
RewriteRule ^([a-z]+)$ $1.html [L]

The big changes are:

  1. IS_SUBREQ didn’t seem to work like you thought (or were told). After an internal rewrite, IS_SUBREQ would still be false. Instead we look at THE_REQUEST, which is the full HTTP request line sent by the browser, and it doesn’t change even as we rewrite the URL internally.

  2. A root slash on the redirect path, otherwise Apache uses the filesystem path.

  3. DOCUMENT_ROOT in the is-a-file test, because it operates on absolute filesystem paths.


EDIT: Actually…

RewriteCond %{DOCUMENT_ROOT}/$1.html -f

…can be rewritten as…

RewriteCond %{REQUEST_FILENAME}.html -f

Both will produce the same value in this case.

Thanks for the reply.

My aim is to:

stop extensions showing for the normal reasons. This part I can do. However this normally does not stop files with extensions being served if they are requested with the extension.

And because this site is a replacement rather than new site, google already has the .html files indexed so they will continue to work.

So my second aim is to redirect all files requested with .html to the extnensionless format so requests for /page or /page.html are all server with the user seeing the URL as /page.

Hope that makes sense!

Thanks very much for taking the time to look at the code and debug it and offer a solution with explanation - very much appreciated and will help me learn more

I will try that tonight and report back here.

Thanks again

Matt

Hi

I tried it but still no joy - the other rules work locally but this not working couldn’t be down to the apache config on my machine could it?

Apologies but I’m at the extent of my knowledge now and this is one of the last things on the to do list!

Cheers

Matt

Hi Jeff

Also, can you explain why this is written as:

and not

RewriteCond %{THE_REQUEST} ^/([a-z]+)\.html$

I am also trying to understand the use of “…” rather than ^(…)$

I appreciate all the help, I just feel I’m getting tied up in knots - probably needs to take a day or two away from it!

Regards,
Matt

Hi all

So I’ve got it working locally - thank you all for the input. Anyone following this, the code above does work as it should.

One major issue was how heavily chrome appears to cache this config (I didn’t expect that) - so testing in incognito mode and opening a new window for each change helped stop that issue.

Learnt a lot on this as part of the process. I’m still a little confused on when to include or exclude the leading slash, so will read more on that now.

Fingers crossed it works on the live server in the same manner. Thanks all

Matt

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.