MY .htaccess file - Please comment with ideas

Hey Guys,

I am scarily new to .htaccess files but see the long term benefits from using it. I am pasting my .htaccess file which I have commented with the way I see what it does, but dont want to have items which is pointless or do not work on the long run for general sites. I created a php framework which I use for my sites and this is the .htaccess file I use. Please give comments, suggestions or ideas as I will really appreciate it :slight_smile:

Please note that as with most frameworks, all of my request should be sent through a single portal (index.php)


# .htaccess file

# Follow all symbolic links (I'm not quite sure what this does :) )
Options +FollowSymLinks

# Turn the rewrite engine on
RewriteEngine On

# Turn the server signature off (speeds the site up a bit)
ServerSignature Off

# Protect against DOS attacks by limiting file upload size
LimitRequestBody 10240000

# Do not allow access to the .htaccess file
<Files .htaccess>
 order allow,deny
 deny from all
</Files>

# Set this for w3 validation and googlebot to crawl the site 
Allow from w3.org htmlhelp.com
Allow from googlebot.com
Satisfy Any

#set the base Url (I really need help here. Does this set base to where my .htaccess file is located or to my webroot (lets say my project was located under domain.com/project/.htaccess) - Is my rewritebase the project/ folder or the domain.com/ folder?)
RewriteBase sites/opanel.co.za

# Ensure all URL's is processed using the www.
RewriteCond %{HTTP_HOST} !^www\\. [NC]
RewriteRule ^ http://www.%{HTTP_HOST}%{REQUEST_URI} [L,R=301]

#dont send the apache error document, but rather send the not found to index.php
ErrorDocument 404 index.php

# Send a forbidden request to some known web spiders
RewriteCond %{HTTP_USER_AGENT} ^Anarchie [OR]
RewriteCond %{HTTP_USER_AGENT} ^ASPSeek [OR]
RewriteCond %{HTTP_USER_AGENT} ^attach [OR]
RewriteCond %{HTTP_USER_AGENT} ^autoemailspider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xenu [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.* - [F,L]

# Disable directory / folder listings and access
Options All -Indexes
IndexIgnore *

# Redirecting index.php to root
RewriteCond %{THE_REQUEST} ^.*/index\\.php
RewriteRule ^(.*)index\\.php$ /$1 [L,R=301]

# Send all requests to the index.php file
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !=/favicon.ico
RewriteCond %{REQUEST_URI} !^/robots\\.txt$
RewriteRule ^ index.php [L]


Hi,


# .htaccess file

# Follow all symbolic links (I'm not quite sure what this does :) )
Options +FollowSymLinks

It tells Apache to use symbolic linked directories. Leave it in there :slight_smile:


# Turn the rewrite engine on
RewriteEngine On

# Turn the server signature off (speeds the site up a bit)
ServerSignature Off

# Protect against DOS attacks by limiting file upload size
LimitRequestBody 10240000

Perfectly fine! :slight_smile:


# Do not allow access to the .htaccess file
<Files .htaccess>
 order allow,deny
 deny from all
</Files>

This should be in the server config! But seeing as this a generic CMS script I guess it won’t hurt to have in it in there.
However, I’d change it to:


<FilesMatch "^\\.ht">
    Order allow,deny
    Deny from all
    Satisfy All
</FilesMatch>

to deny access to any file that start with .ht


# Set this for w3 validation and googlebot to crawl the site 
Allow from w3.org htmlhelp.com
Allow from googlebot.com
Satisfy Any

Why is this in there? There is no HTTP Auth set anywhere, so everybody can get access. No need to create an exception.
Also, I’m not sure if Allow works with hostnames, but I am sure it will be faster if you use IP addresses instead of hostnames.
So if you leave it in there please try to find out the IP addresses they use and put those in there instead of the hostnames.


#set the base Url (I really need help here. Does this set base to where my .htaccess file is located or to my webroot (lets say my project was located under domain.com/project/.htaccess) - Is my rewritebase the project/ folder or the domain.com/ folder?)
RewriteBase sites/opanel.co.za

RewriteBase is used to undo the effects of Alias or AliasMatch. Since you don’t have either in your script you don’t need RewriteBase at all.
Okay, people who use your CMS may use Alias or AliasMatch, in which case you could leave it in as


RewriteBase /


# Ensure all URL's is processed using the www.
RewriteCond %{HTTP_HOST} !^www\\. [NC]
RewriteRule ^ http://www.%{HTTP_HOST}%{REQUEST_URI} [L,R=301]

#dont send the apache error document, but rather send the not found to index.php
ErrorDocument 404 index.php

:tup:


# Send a forbidden request to some known web spiders
RewriteCond %{HTTP_USER_AGENT} ^Anarchie [OR]
RewriteCond %{HTTP_USER_AGENT} ^ASPSeek [OR]
RewriteCond %{HTTP_USER_AGENT} ^attach [OR]
RewriteCond %{HTTP_USER_AGENT} ^autoemailspider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xenu [OR]
[COLOR="Red"]RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
[/COLOR]RewriteRule ^.* - [F,L]

Everything that matches ^Zeus.*Webster also matches ^Zeus. There is no need for that first of the two conds I’ve colored red.


# Disable directory / folder listings and access
Options All -Indexes
IndexIgnore *

You have already defined Options at the top. Remove the options here and change the options at the top to
Also, you have disabled indexing, so telling Apache to ignore all files is redundant, you can remove that line.


Options +FollowSymlinks -Indexes


# Redirecting index.php to root
RewriteCond %{THE_REQUEST} ^.*/index\\.php
RewriteRule ^(.*)index\\.php$ /$1 [L,R=301]

:tup:


# Send all requests to the index.php file
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !=/favicon.ico
RewriteCond %{REQUEST_URI} !^/robots\\.txt$
RewriteRule ^ index.php [L]

Why the exceptions for favicon.ico and robots.txt ?
Those will also be matched by RewriteCond %{REQUEST_FILENAME} !-f

Wow a fast and effective reply. Thank you SOOO much.

Just 2 more things if this will help, surrounding rewritebase:

  1. I have no way of knowing where exactly my framework will be located. It could be in root or in a subfolder of the domain. Would rewritebase make any difference if I leave it as rewritebase / and my project is located lets say in the /project/ folder?

  2. I am planning (within the next couple of days) to cross over to a true mod_rewrite naming convention for my sites (www.domain.com/about/) to goto about.php and so forth. Does my current implimentation restrict me to do so? I played around with some tutorials and know how to make that implimentation, but just worried that I have something else which may break my idea :slight_smile:

Once again thank you for your prompt and effective response, what a breathe of fresh air :slight_smile:

Yes that will make a difference. Just put a disclaimer with it, like “the following line assumes the CMS installed in the root directory of your site. If you install it somewhere else update the line to reflect that”.

I don’t see anything in the code that would prevent it.
But I wonder, why would you prefer that approach over the bootstrap (redirect everything to index.php) approach?

Thanks again! Out of curiousity, what happens if I do not specify the rewritebase?

To answer your question, I just want to start using cleaner urls. So my chain of thought is:

  1. .htaccess file picks up that e.g. www.domain.com/about.php gets called and then 301 the call to make it www.domain.com/about/
  2. The 301 takes place, and now the call is www.domain.com/about/, another rewriterule will then pick up I am using about/ and direct the traffic to index.php
  3. My index.php with my front controller will take the request_uri and strip it from any nasties effectively leaving “about” as the file being called, check if the view file does exist, else output notfound or something in that line.

I really like cleaner URL’s and want to venture into that :slight_smile: Please any suggestion from your end will really help (as its clear you have plenty more years of development experience more than me :wink: )

Thanks again.

PS> I am attempting point number 1 at the top but it does not redirect properly, and advice?:


# Rewrite .php files to the mod_rewrite naming convention
RewriteCond %{THE_REQUEST} ^(.*)\\.php [NC]
RewriteRule ^([a-zA-Z0-9_-]+)\\.php $1/ [NC,R=301,L]

Here is a better working example of the code I sent earlier. The only change here is it adds /var/www to my redirect (domain.com/var/www/about/) and should just be domain.com/about/


# Rewrite .php files to the mod_rewrite naming convention
#RewriteCond %{THE_REQUEST} ^(.*)\\.php [NC]
RewriteRule ^([a-zA-Z0-9_-]+)\\.php $1/ [NC,R=301,L]

# Rewrite all filenames to mod rewrite filenames
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([a-zA-Z0-9_-]+)/? $1.php [NC]

So long as you don’t have Alias or AliasMatch in the .htaccess, nothing. Everything will just work fine if you remove it in that case.

For more information read this section of the Apache documentation.

But if you have a front controller, like you say in 3) , why do you need an about.php at all?

It’s more a worst case scenario call. Something which is likely to happen is someone visits my site and go www.domain.com/index.php (or the htaccess file can send it to index.php - Now my front controller gets index.php which breaks my front controller as I am building the controller to only accept inputs like domain.com/index/ as an example.

So my general idea was to 1) 301 the user to ensure they’re not using .php conventions (that user which stuffs things up) and then 2) send index/ to index.php which then checks the request_uri and builds the template from there.

A possible drawback I can see here is get variables because i will end up having a href about/?id=123 but I will just create a rewriterule for when that happens (thats the norm right? :slight_smile: )

1a) Why would a user visit about.php ? And if they do, why not show them a 404 instead of guessing what they’re up to.

1b) What if they visit contact.php on a site that doesn’t have a contact page?

  1. Do you allow request for / or does it have to be /index ?
    I’d advise against the latter because it’s not necessary. Just put a default controller in the front controller and use that when no controller is provided in the URL. Then you can just use /

1a) You are right. It would be better sending them to a not found, but seeing that the site is setup to show a custom 404 within the wrapper of the site, we still need to convert the filename to technically remove the .php and make it about/ so that the front controller can pick it up.
1b) Same counts as above. Its normal to assume contact.php will be the contact page. So in essence we want to strip the .php, 301 it to contact/ and if the file does not exist display the custom 404, else if the file exists, lucky user :slight_smile:

  1. I dont mind if index does not ever show, so in essence wether its domain.com/ or domain.com/index/ I couldnt really be phased, but I guess it would be better to have it as domain.com/index/ because there is less chance of having a possible flaw later.

I managed to almost get this to work with my code at the top, but running into an issue where it appears to attach the absolute path. Any ideas?

I decided to give up on my existing idea to force index.php (or other files) to index/ ect. I would love to still get it right but I am struggling a little bit too much and don’t see an end result.

My next issue which I guess is more important is sending about/ or similiar filenames to the index.php file. So when calling domain.com/index/ I want it to display my index file within my front controller and same with about/ ect. If the file does not exist it will call a notfound file. This WORKS like a CHARM!

The problem now is passing get variables. I am using request_uri to determine which file gets called. Without any get variables I have had lovely success (worked the first time) but the moment I add a get variable (domain.com/about/23 which directs to index.php?id=23) the get variable gets submitted in an array as one would expect, but the request_uri becomes 23 and not about anymore. I’ll assume this is an incompatibility issue but I have no idea how to resolve this.

Here is my htaccess snippet doing the work:


# Rewrite .php files to the mod_rewrite naming convention
# RewriteCond %{THE_REQUEST} ^(.*)\\.php [NC]
# RewriteRule ^([a-zA-Z0-9_-]+)\\.php$ $1/ [NC,L,R=301]

# Rewrite all filenames to mod rewrite filenames
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([a-zA-Z0-9_-]+)/([a-zA-Z0-9_-]+)/?$ index.php?id=$2 [NC,L]
RewriteRule ^(.*) index.php [NC]

The code getting the variable in my controller:


private function setRequestedUri() {
	$userURI = basename($_SERVER["REQUEST_URI"]);
	if($userURI == "" || $userURI == "/") {
		$this->requestFileName = "index";
	} else {
		$this->requestFileName = $userURI;
	}
}

Any ideas / advice you can give would greatly be appreciated!

Why don’t you let URL routing be done by PHP completely? So, something along the lines of http://www.phpaddiction.com/tags/axial/url-routing-with-php-part-one/

That’s more powerful than what you’re trying now IMHO.