Two versions of forcing "www"

Hallo,

I’ve got two rewrites that force a “www” in the url, one that I found looking around and one that the sysadmin found on another site of ours. I’ve like to know if there are disadvantages of one over the other… one does haz the lazy regex, but the other one is little better.

The site currently has some sort of redirect (possibly done with PHP) to remove index.html/index.php but I don’t know why really, all the other .php file extensions were left in.

So http://sitename.eu needed to always go to http://www.sitename.eu and there may not necessarily be a request uri after it.

I believe mine came from askapache, but it looks identical to DK’s:


RewriteEngine on
RewriteCond %{HTTP_HOST} !^www\\.sitename\\.eu$ [NC]
RewriteRule .? http://www.sitename.eu%{REQUEST_URI} [R=301,L]

.? would be possibly “a” character, but any type. Any junk characters (don’t match any real file names) cause a redirect to the main page (again I believe using PHP not Apache). I’m possibly confusing my regexes because I’m thinking ? referred to only one char. houses? just the last s is optional.

The RewriteRule says to match zero or one of anything then redirect to http://www.example.com with the original {REQUEST_URI}.

What if there’s more than one of something to match?

What he found and has been working on another site of ours:


RewriteEngine on
RewriteCond %{HTTP_HOST} !^www [NC]
RewriteRule ^(.*)$ http://www.%{HTTP_HOST}/$1 [R=301,L]

Here the (.*) could be a good thing since we don’t want to insist on anything after the HOST… nothing should match nothing and that should be ok, right? If the match contains a leading / then that would get appended to $1 but this doesn’t seem to cause an issue: www.sitename.eu//////filename.php just goes to filename.php (possibly PHP is doing another redirect?), unless filename.php doesn’t exist, so then to main page.

I want to know if someone with more experience can see any obvious disadvantages between the two… am I wrong to have the RewriteCond match a full particular domain name? Or is that better? Or is it 6 of one and a half dozen of another? What to insist on if anything after the domain name is optional (and currently no ?=query strings but I don’t know that it will remain so!).

Again, junk characters and “index.php” or “index.html” both get redirected to the domain name alone.

He’s putting these in .htaccess, and the sites are virtual hosts.

Sp,

Welcome back to the wonderful world of mod_rewrite!

The first bit of code you’ve shown will merely check whether the {HTTP_HOST} is requested with the www or not then, if not, will redirect the original request (along with any query string - that’s automatic) to the www’d version of your domain. The “a” character is only there to match any character which may be present (none required because of the ? which makes THE PRECEDING CHARACTER - or set of characters if in parentheses - optional). In this case, the {REQUEST_URI} merely passes through the original request so you’re quite correct, PHP is messing around where it shouldn’t be messing around.

RewriteEngine on
RewriteCond %{HTTP_HOST} !^www [NC]
RewriteRule ^(.*)$ http://www.%{HTTP_HOST}/$1 [R=301,L]

This does EXACTLY the same thing EXCEPT:

  1. It’s only looking to NOT match the leading www in the {HTTP_HOST}

  2. It captures (what’s already captured by {REQUEST_URI} but YOU need to be concerned about the difference between Apache 1.x and Apache 2.x as the :kaioken: EVERYTHING :kaioken: atom will capture Apache 1’s leading /, too!

  3. If you’re using subdomains in your {HTTP_HOST}, those will also be sent along in the redirection.

  4. Using the host/$1 MAY give you //path/to/file after the domain (Apache 1.x) and you’ve wasted CPU cycles capturing the {REQUEST_URI} in your :kaioken: EVERYTHING :kaioken: atom when it was already available, i.e., inefficient programming. Looking ahead in your post, your host is using Apache 1.x and that’s the problem you’re having with the multiple /'s.

No, the .? was more of a placeholder for the regex (it doesn’t matter whether there’s anything there or not, the redirection is executed on the basis of the match in the RewriteCond.

IMHO, Apache is a file server and PHP is a file creator. They should be used to their best advantage, i.e., WHY in the world would anyone be using PHP for simple redirects?

Regards,

DK

in your post, your host is using Apache 1.x and that’s the problem you’re having with the multiple /'s.

It may instead be an incorrect-but-somehow-working regex, because as far as I know (hah) we’re using Apache2… as the servers only got switched to Linux in the last year.
Ah just tried to get to a 404 page on another of our pages… damn he removed the server sig! But I could have sworn earlier I saw it was 2.2.3 on CentOs.

So anyway you’re saying we ARE having problems with (.*) matching also /'s and possibly PHP is covering it up? If things go well I can bring that up at our next meeting because any sites that start getting busy, they DO care about speed and CPU cycles. Your post will be my defense! Lawlz.

IMHO, Apache is a file server and PHP is a file creator. They should be used to their best advantage, i.e., WHY in the world would anyone be using PHP for simple redirects?

I’m guessing, when someone started getting all the PHP stuff set up around here, it came with some modules that did that. Despite the sysadmin being a Linux guy he does not know regexes and maybe is more of a hardware guy? I’m not sure. He does not know mod_rewrite and I think our problem is me (a front ender, I don’t touch servers!) being assigned to learn how to “make the urls google-friendly” rather than the guy running Apache.

but YOU need to be concerned about the difference between Apache 1.x and Apache 2.x as the EVERYTHING atom will capture Apache 1’s leading /, too!

Yeah I saw that when I first got the email with the regex#2 with (.*) which was what he ended up implementing (I don’t know if he tried #1 and had problems or simply wanted to only try one he knew was working on another site) so that’s why I tested the site with lots of /////// which, it’s good to know it’s innefficient, but it’s not breaking anything (but I don’t like that I don’t know WHY… the sysadmin isn’t the one who set up most of the PHP stuff around here).

*edit ah, the ////// thing is something different. I thought regexes had to deal with that but my lighttpd server (of my own personal site) and my colleague’s apache server at home, without any special regexes, seem to equate ///////~ to a single slash. So I was wrong in thinking PHP had anything to do with that. And maybe Apache1 was not so forgiving.

Yep, Apache seems to have some internal mechanism we can simulate with the following preg_replace


preg_replace("/(\\/){2,}/", "/", $_SERVER['REQUEST_URI']);

i.e., replace two or more forward slashes with one forward slash

(for example, try www.sitepoint.com////forums, also works …)

Sp,

I think you’re correct about a new installation using Apache 2. Just use a single page script with the PHP command: “phpinfo();” That will dump all the info you ever need to know about your server setup - way down on that page, you’ll get the Apache information.

IMHO, using (.*) without a REALLY good reason is just being lazy (it’s known as “lazy regex” because it will match nothing AND everything). All too often, it’s the cause of any problems newbies have - and installing a lot of /'s is merely a signal that this is the case (and say that you’re using Apache 1.x).

Defense? Ha! It’s poor programming to capture junk that you’re not going to do anything with! What defense is there to comment on poor code?

Okay, in your sysadmin’s defense, I tend to fall back on things that I know, too, as I’m all too ready to use mod_rewrite rather than mod_alias (Redirects). Same with PHP. The trick is to know when a better tool is available and to use the best tool to solve your problems. It seems that there is always more than one way to do something - the right way and … well, any brute force way. As long as the job gets done, it’s silly to fix it - unless you need to optimize!

I had a boss who had been a coder for space launch vehicles. He was reputed to have had a boss who insisted on optimized code so he programmed a loop into his code and went fishing. Reducing the loop cycles improved the code over time and he received a “well done” after removing the loop. Now, that’s optimizing code!

The guys over in the SEO forum board should be able to help you with the innards of SE’s but the extensionless URIs seem to be pretty good, too. In fact, one client of mine (http://wilderness-wally.com) uses his article titles as the links for his website (powered by a single script) and that can really help with the page recognition.

Thanks to you and ScallioXTX for pointing out that multiple /'s are treated as a single / - I hadn’t known that. Okay, I think that even // is particularly UGLY and wouldn’t tolerate that on my sites - but that’s just the “professionalism” in me.

Yes, Apache 1.x was NOT very forgiving - it would not terminate a loop in mod_rewrite, ergo my loathing of (.*) which is the greatest cause of loopy code!

Regards,

DK

Somewhat off-topic:

[ot]

Defense? Ha! It’s poor programming to capture junk that you’re not going to do anything with! What defense is there to comment on poor code?

Let’s say I’m the only code Nazi in the house, yet I’m not even a coder and that’s a bad combination— complaining about code and how things are done when I myself do not know how to do things and cannot (yet) code myself out of an apple pie (I’d like to become a programmer but find myself struggling over ridiculously stupid things in Javascript… we’ll see if this can even work).

I can’t say what PHP should or shouldn’t do here, or what Apache should or shouldn’t do here, because I cannot write PHP and I do not know Apache.

This is not a web company, nor an IT company. It’s an insurance company.

So, defense is necessary if I’m going to proclaim something to the sysadmin (who’s a nice guy) or the bosses. You’re not Rich Bowen (not that they know the name), but you know Apache.[/ot]