Cleaning out .htaccess - looking good?

Hi all,

I’m cleaning up my .htaccess file and would like to get a second opinion on the below? For example, for the lines marked in red, am I right in thinking that these lines exclude each other and that one set of these (those at the top) are sufficient on their own?

Any other things you would improve?

I’ve changed the domain names for security purposes. The .htaccess file as it stands below works fine.

Many thanks,

# Rewrite enabled
RewriteEngine On
RewriteRule ^([0-9]+)\\.([0-9]+)\\.([0-9]+)\\.([0-9]+)\\.([0-9]+)\\.([0-9]+)\\.phtml$ index.php?id=$1,$2,$3,$4,$5,$6
RewriteRule ^(.+)\\.phtml$ index.php?$1 [L]

# Deny stealing content (no hot linking)
[COLOR="Red"]RewriteCond %{REQUEST_FILENAME} .*jpg$|.*jpeg$|.*gif$|.*png$|.*ico$|.*js$|.*css$|.*txt$ [NC]
RewriteCond %{HTTP_REFERER} !^$ 
RewriteCond %{HTTP_REFERER} !^http://(.+\\.)?xyz\\.com/ [NC] [OR]
RewriteCond %{HTTP_REFERER} !^http://www.xyz\\.com/.*$ [NC] [OR]
RewriteRule (.*) /nohotlink.php?pic=$1 [R=302,L][/COLOR]

# Improve site page rank
RewriteCond %{HTTP_HOST} ^xyz\\.com
RewriteRule ^(.*)$ http://www.xyz\\.com/$1 [R=permanent]

# Force PHP5
AddType x-mapp-php5 .php

# Media files - 7 days
<FilesMatch "\\.(ico|pdf|flv|jpg|jpeg|png|gif|swf|mp3|mp4)$">
Header set Cache-Control "max-age=302400, must-revalidate, public, no-transform"
</FilesMatch>
 
# HTML etc. files - 2 hours
<FilesMatch "\\.(html|htm|xml|txt|xsl)$">
Header set Cache-Control "max-age=7200, must-revalidate, public, no-transform"
</FilesMatch>
  
# JS/CSS files - 1 day
<FilesMatch "\\.(js|css)$">
Header set Cache-Control "max-age=43200, must-revalidate, public, no-transform"
</FilesMatch>

# PHP etc. dynamic files - disabled
<FilesMatch "\\.(pl|php|[sf]?cgi|spl)$">
Header set Cache-Control: "max-age=0, no-store"
</FilesMatch>

# Needed for php extensionless (clear URL) redirect.
Options -MultiViews

# If index or index.php requested, strip and redirect
RewriteCond %{THE_REQUEST} index(\\.php)?
RewriteRule ^index(\\.php)?$ http://www.xyz.com [R=301,L]

# Pass Through the empty request (to be handled as DirectoryIndex)
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\\ /?\\ HTTP
RewriteRule .? - [PT]

# Remove PHP extension from links (internally)
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule ^(.+)$ /$1.php [L,QSA]

# Remove PHP extension from links (externally)
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\\ /([^.]+\\.)+php\\ HTTP
RewriteRule ^(.+)\\.php$ /$1 [R=301,L]

# Obscure 'php.ini' files (where they exist)
RedirectMatch 404 .*php\\.ini$

# Deny access to file
<Files .htaccess>
order allow,deny
deny from all
</Files>

# eliminate Code Red and NIMDA Virus attacks
redirect /scripts http://www.stoptheviruscold.invalid
redirect /MSADC http://www.stoptheviruscold.invalid
redirect /c http://www.stoptheviruscold.invalid
redirect /d http://www.stoptheviruscold.invalid
redirect /_mem_bin http://stoptheviruscold.invalid
redirect /msadc http://stoptheviruscold.invalid
RedirectMatch (.*)\\cmd.exe$ http://stoptheviruscold.invalid$1 

# automatically corect simple speling errors
<IfModule mod_speling.c>
 CheckSpelling Off
</IfModule>

# Directory index page
DirectoryIndex index.php index.html index.shtml index.htm

IndexIgnore *
Options -Indexes

# Error pages handling
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule (.php) /error/404.php

ErrorDocument 400 /error/400.php
ErrorDocument 401 /error/401.php
ErrorDocument 403 /error/403.php
ErrorDocument 404 /error/404.php
ErrorDocument 500 /error/500.php
ErrorDocument 501 /error/501.php
ErrorDocument 502 /error/502.php

# Performance tweaks
<IfModule mod_deflate.c>
a2enmod deflate
SetOutputFilter DEFLATE
AddOutputFilterByType DEFLATE text/html text/plain text/css text/javascript application/x-httpd-php application/x-httpd-fastphp application/javascript application/json
AddOutputFilterByType DEFLATE text/xml application/xml text/x-component
DeflateCompressionLevel 9
BrowserMatch ^Mozilla/4 gzip-only-text/html
BrowserMatch ^Mozilla/4\\.0[678] no-gzip
BrowserMatch \\bMSIE !no-gzip !gzip-only-text/html
</IfModule>

<ifModule mod_headers.c>
  Header unset ETag
</ifModule>
FileETag None

<IfModule prefork.c>
StartServers 44
MinSpareServers 22
MaxSpareServers 44
ServerLimit 300
MaxClients 200
MaxRequestsPerChild 0
</IfModule>

<IfModule worker.c>
 StartServers 9
 MaxClients 200
 MinSpareThreads 110
 MaxSpareThreads 330
 ThreadsPerChild 110
 MaxRequestsPerChild 0
</IfModule>

Options FollowSymLinks

#Deny hot linking images
[COLOR="red"]# RewriteCond %{HTTP_REFERER} !^$
# RewriteCond %{HTTP_REFERER} !^http://xyz.com/.*$ [NC] [OR]
# RewriteCond %{HTTP_REFERER} !^http://www.xyz.com/.*$ [NC] [OR]
# RewriteRule .*\\.(gif|GIF|jpg|JPG|png|PNG|bmp|BMP|wav|mp3|wmv|avi|mpeg)$ - [F][/COLOR]

# RewriteCond %{HTTP_REFERER} !^http(s)?://([-a-z0-9]+\\.)?xyz.com/ [NC] [OR]
# RewriteCond %{HTTP_REFERER} !^http(s)?://(.+\\.)?xyz.com/.*$ [NC] 
# RewriteCond %{HTTP_REFERER} !^$
# RewriteCond %{HTTP_REFERER} !google. [NC]
# RewriteCond %{HTTP_REFERER} !search?q=cache [NC]
# RewriteCond %{HTTP_REFERER} !msn. [NC]
# RewriteCond %{HTTP_REFERER} !yahoo. [NC]
# RewriteRule .*\\.(gif|GIF|jpg|JPG|png|PNG|bmp|BMP|wav|mp3|wmv|avi|mpeg)$ nolinking.jpe [NC,R]

# RewriteCond %{HTTP_REFERER} !^http://(.+\\.)?xyz\\.co\\.uk/ [NC] [OR]
# RewriteCond %{HTTP_REFERER} !^http://(www\\.)?xyz\\.co\\.uk/.*$ [NC] [OR]
# RewriteCond %{HTTP_REFERER} !^$
# RewriteRule .*\\.(gif|GIF|jpg|JPG|png|PNG|bmp|BMP|wav|mp3|wmv|avi|mpeg)$ nolinking.png

# Deny access by user agents
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Bot\\ mailto:craftbot@yahoo.com [OR] 
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Custo [OR] 
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Download\\ Demon [OR] 
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR] 
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR] 
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR] 
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Express\\ WebPictures [OR] 
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR] 
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR] 
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR] 
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR] 
RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR] 
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR] 
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR] 
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^Image\\ Stripper [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Image\\ Sucker [OR] 
RewriteCond %{HTTP_USER_AGENT} Indy\\ Library [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Internet\\ Ninja [OR] 
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR] 
RewriteCond %{HTTP_USER_AGENT} ^JOC\\ Web\\ Spider [OR] 
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR] 
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Mass\\ Downloader [OR] 
RewriteCond %{HTTP_USER_AGENT} ^MIDown\\ tool [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Mister\\ PiX [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR] 
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR] 
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR] 
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Net\\ Vampire [OR] 
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Offline\\ Explorer [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Offline\\ Navigator [OR] 
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Papa\\ Foto [OR] 
RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR] 
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR] 
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR] 
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR] 
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR] 
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR] 
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR] 
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR] 
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Teleport\\ Pro [OR] 
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Web\\ Image\\ Collector [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Web\\ Sucker [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebGo\\ IS [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Website\\ eXtractor [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Website\\ Quester [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\\ WebSpider [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Zeus 
RewriteRule ^.* - [F]

# NameProtect peddles their “online brand monitoring” to unsuspecting and gullible companies
# looking for people to sue. Despite the claims on their robot information page, they do not
# respect robots.txt; in fact, they spoof their User-Agent in multiple ways to avoid detection.
# I have banned them by User-Agent and IP address.
RewriteCond %{REMOTE_ADDR} ^12\\.148\\.196\\.(12[8-9]¦1[3-9][0-9]¦2[0-4][0-9]¦25[0-5])$ [OR]
RewriteCond %{REMOTE_ADDR} ^12\\.148\\.209\\.(19[2-9]¦2[0-4][0-9]¦25[0-5])$ [OR]
RewriteCond %{HTTP_USER_AGENT} NPBot[NC]
RewriteRule .* - [F,L]

nsm,

A clean-up is definitely required! With all that in the .htaccess file, you’re slowing your server terribly and, if on a shared server, you should have your account removed (or .htaccess blocked) because it’ll adversely affect everyone!

In general, let me start with my latest “standard rant:”

[standard rant #4][indent]The definition of an idiot is someone who repeatedly does the same thing expecting a different result. Asking Apache to confirm the existence of ANY module with an <IfModule> … </IfModule> wrapper is the same thing in the webmaster world. DON’T BE AN IDIOT! If you don’t know whether a module is enabled, run the test ONCE then REMOVE the wrapper as it is EXTREMELY wasteful of Apache’s resources (and should NEVER be allowed on a shared server).[/indent][/standard rant #4]

The FIRST thing that you’re supposed to learn about mod_rewrite is when NOT to use it! Basically, it’s great for testing and for changing specific settings on a per directory basis but things like ban lists (your {USER_AGENT} list) should go in the server (or vhost) configuration file (where it’s read once and is retained for processing). Of course, if you don’t have access to the server configuration file, then you’re stuck with .htaccess (but ARE adversely impacting everyone on your shared server … and there is NOTHING “fine” about that!).

As a technique, I move all the “core” directives to the top of my .htaccess because they’ll ALWAYS take precedence over mod_rewrite directives. Using that ordering, I’m reminded of the precedence as I scan through the lines. I won’t move things around because you know which are mod_rewrite and which are not.

On to specific comments which I’ll embed in your code:

# Rewrite enabled
RewriteEngine On
RewriteRule ^([0-9]+)\\.([0-9]+)\\.([0-9]+)\\.([0-9]+)\\.([0-9]+)\\.([0-9]+)\\.phtml$ index.php?id=$1,$2,$3,$4,$5,$6
RewriteRule ^(.+)\\.phtml$ index.php?$1 [L]

# Deny stealing content (no hot linking)
[COLOR="Red"]RewriteCond %{REQUEST_FILENAME} .*jpg$|.*jpeg$|.*gif$|.*png$|.*ico$|.*js$|.*css$|.*txt$ [NC]
RewriteCond %{HTTP_REFERER} !^$ 
RewriteCond %{HTTP_REFERER} !^http://(.+\\.)?xyz\\.com/ [NC] [OR]
RewriteCond %{HTTP_REFERER} !^http://www.xyz\\.com/.*$ [NC] [OR]
RewriteRule (.*) /nohotlink.php?pic=$1 [R=302,L][/COLOR]
[indent]The 4th RewriteCond is merely a special condition of the 3rd
and can be removed and, because you don't need the subdomain, 
you can remove the start anchor and everything up to an optional 
dot character in front of xyz\\.com and leave it at that.

Don't forget the [OR] flag on the 2nd RewriteCond.

While more of a "optimization thing," I'd use the regex of the 
1st RewriteCond to replace the regex of the RewriteRule (because
the RewriteCond statements are only processed IF the
RewriteRule's regex is matched) and remove the first condition.[/indent]
# Improve site page rank
RewriteCond %{HTTP_HOST} ^xyz\\.com
RewriteRule ^(.*)$ http://www.xyz\\.com/$1 [R=permanent]
[indent]No case flag![/indent]
# Force PHP5
AddType x-mapp-php5 .php
[indent]DEFINITELY something for your server configuration file![/indent]
# Media files - 7 days
<FilesMatch "\\.(ico|pdf|flv|jpg|jpeg|png|gif|swf|mp3|mp4)$">
Header set Cache-Control "max-age=302400, must-revalidate, public, no-transform"
</FilesMatch>
[indent]Core[/indent]
# HTML etc. files - 2 hours
<FilesMatch "\\.(html|htm|xml|txt|xsl)$">
Header set Cache-Control "max-age=7200, must-revalidate, public, no-transform"
</FilesMatch>
[indent]Ditto[/indent]
# JS/CSS files - 1 day
<FilesMatch "\\.(js|css)$">
Header set Cache-Control "max-age=43200, must-revalidate, public, no-transform"
</FilesMatch>
[indent]Ditto[/indent]
# PHP etc. dynamic files - disabled
<FilesMatch "\\.(pl|php|[sf]?cgi|spl)$">
Header set Cache-Control: "max-age=0, no-store"
</FilesMatch>
[indent]Ditto[/indent]
# Needed for php extensionless (clear URL) redirect.
Options -MultiViews
[indent]That turns MultiViews OFF 
- which, IMHO, is a good thing to do![/indent]
# If index or index.php requested, strip and redirect
RewriteCond %{THE_REQUEST} index(\\.php)?
RewriteRule ^index(\\.php)?$ http://www.xyz.com [R=301,L]
[indent]Extensionless URIs?  Okay, I do that, too, but removing the 
DirectoryIndex is, IMHO, a silly thing to do ... and ADD the trailing / 
on the domain to save more useless cycles by Apache![/indent]
# Pass Through the empty request (to be handled as DirectoryIndex)
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\\ /?\\ HTTP
RewriteRule .? - [PT]
[indent]What's the /? supposed to do, make the / optional?  
I believe that the ? is superfluous.[/indent]
# Remove PHP extension from links (internally)
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule ^(.+)$ /$1.php [L,QSA]
[indent]Actually, it ADDs rather than removes the .php extension.
The QSA flag is not required - the {QUERY_STRING} 
is not impacted by your redirection.[/indent]
# Remove PHP extension from links (externally)
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\\ /([^.]+\\.)+php\\ HTTP
RewriteRule ^(.+)\\.php$ /$1 [R=301,L]

# Obscure 'php.ini' files (where they exist)
RedirectMatch 404 .*php\\.ini$
[indent]It appears that there is no redirection so this should cause 
an error.  Besides, it's better handled by a <FilesMatch> 
directive (as you have next with the .htaccess file).[/indent]
# Deny access to file
<Files .htaccess>
order allow,deny
deny from all
</Files>

# eliminate Code Red and NIMDA Virus attacks
redirect /scripts http://www.stoptheviruscold.invalid
redirect /MSADC http://www.stoptheviruscold.invalid
redirect /c http://www.stoptheviruscold.invalid
redirect /d http://www.stoptheviruscold.invalid
redirect /_mem_bin http://stoptheviruscold.invalid
redirect /msadc http://stoptheviruscold.invalid
RedirectMatch (.*)\\cmd.exe$ http://stoptheviruscold.invalid$1 
[indent]Core[/indent]
# automatically corect simple speling errors
<IfModule mod_speling.c>
 CheckSpelling Off
</IfModule>
[indent]See Standard Rant #4 above.[/indent]
# Directory index page
DirectoryIndex index.php index.html index.shtml index.htm
[indent]Core[/indent]
IndexIgnore *
[indent]Sorry, I have to ignore this one as I just don't recognize it.[/indent]
Options -Indexes
[indent]Combine with the -MultiViews; core[/indent]
# Error pages handling
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule (.php) /error/404.php
[indent]WHY create an atom if you're not going to use it?  
WHY limit this to a file with {something}php (not 
necessarily a dot character as it's NOT escaped)?  
IMHO, if you're this far through your mod_rewrite 
and have a missing file, this regex should be .? 
(to catch everything AND nothing).[/indent]
ErrorDocument 400 /error/400.php
ErrorDocument 401 /error/401.php
ErrorDocument 403 /error/403.php
ErrorDocument 404 /error/404.php
ErrorDocument 500 /error/500.php
ErrorDocument 501 /error/501.php
ErrorDocument 502 /error/502.php
[indent]Core[/indent]
# Performance tweaks
<IfModule mod_deflate.c>
a2enmod deflate
SetOutputFilter DEFLATE
AddOutputFilterByType DEFLATE text/html text/plain text/css text/javascript application/x-httpd-php application/x-httpd-fastphp application/javascript application/json
AddOutputFilterByType DEFLATE text/xml application/xml text/x-component
DeflateCompressionLevel 9
BrowserMatch ^Mozilla/4 gzip-only-text/html
BrowserMatch ^Mozilla/4\\.0[678] no-gzip
BrowserMatch \\bMSIE !no-gzip !gzip-only-text/html
</IfModule>
[indent]Performance tweaks indeed!  This Standard Rant #4 will DEFINITELY tweak the server by ***SLOWING*** it![/indent]
<ifModule mod_headers.c>
  Header unset ETag
</ifModule>
[indent]Ditto[/indent]
FileETag None

<IfModule prefork.c>
StartServers 44
MinSpareServers 22
MaxSpareServers 44
ServerLimit 300
MaxClients 200
MaxRequestsPerChild 0
</IfModule>
[indent]Core and Ditto the Standard Rant #4 comment.[/indent]
<IfModule worker.c>
 StartServers 9
 MaxClients 200
 MinSpareThreads 110
 MaxSpareThreads 330
 ThreadsPerChild 110
 MaxRequestsPerChild 0
</IfModule>
[indent]Ditto again.[/indent]
Options FollowSymLinks
[indent]Core - but necessary for mod_rewrite to work so this is a SERVER configuration file item.

Okay, I've burned out with this review but the rest are merely WTF and INAPPROPRIATE LOCATION for these directives.[/indent]
#Deny hot linking images
[COLOR="red"]# RewriteCond %{HTTP_REFERER} !^$
# RewriteCond %{HTTP_REFERER} !^http://xyz.com/.*$ [NC] [OR]
# RewriteCond %{HTTP_REFERER} !^http://www.xyz.com/.*$ [NC] [OR]
# RewriteRule .*\\.(gif|GIF|jpg|JPG|png|PNG|bmp|BMP|wav|mp3|wmv|avi|mpeg)$ - [F][/COLOR]

# RewriteCond %{HTTP_REFERER} !^http(s)?://([-a-z0-9]+\\.)?xyz.com/ [NC] [OR]
# RewriteCond %{HTTP_REFERER} !^http(s)?://(.+\\.)?xyz.com/.*$ [NC] 
# RewriteCond %{HTTP_REFERER} !^$
# RewriteCond %{HTTP_REFERER} !google. [NC]
# RewriteCond %{HTTP_REFERER} !search?q=cache [NC]
# RewriteCond %{HTTP_REFERER} !msn. [NC]
# RewriteCond %{HTTP_REFERER} !yahoo. [NC]
# RewriteRule .*\\.(gif|GIF|jpg|JPG|png|PNG|bmp|BMP|wav|mp3|wmv|avi|mpeg)$ nolinking.jpe [NC,R]

# RewriteCond %{HTTP_REFERER} !^http://(.+\\.)?xyz\\.co\\.uk/ [NC] [OR]
# RewriteCond %{HTTP_REFERER} !^http://(www\\.)?xyz\\.co\\.uk/.*$ [NC] [OR]
# RewriteCond %{HTTP_REFERER} !^$
# RewriteRule .*\\.(gif|GIF|jpg|JPG|png|PNG|bmp|BMP|wav|mp3|wmv|avi|mpeg)$ nolinking.png

# Deny access by user agents
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Bot\\ mailto:craftbot@yahoo.com [OR] 
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Custo [OR] 
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Download\\ Demon [OR] 
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR] 
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR] 
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR] 
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Express\\ WebPictures [OR] 
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR] 
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR] 
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR] 
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR] 
RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR] 
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR] 
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR] 
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^Image\\ Stripper [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Image\\ Sucker [OR] 
RewriteCond %{HTTP_USER_AGENT} Indy\\ Library [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Internet\\ Ninja [OR] 
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR] 
RewriteCond %{HTTP_USER_AGENT} ^JOC\\ Web\\ Spider [OR] 
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR] 
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Mass\\ Downloader [OR] 
RewriteCond %{HTTP_USER_AGENT} ^MIDown\\ tool [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Mister\\ PiX [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR] 
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR] 
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR] 
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Net\\ Vampire [OR] 
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Offline\\ Explorer [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Offline\\ Navigator [OR] 
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Papa\\ Foto [OR] 
RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR] 
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR] 
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR] 
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR] 
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR] 
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR] 
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR] 
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR] 
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Teleport\\ Pro [OR] 
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Web\\ Image\\ Collector [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Web\\ Sucker [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebGo\\ IS [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Website\\ eXtractor [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Website\\ Quester [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\\ WebSpider [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Zeus 
RewriteRule ^.* - [F]

# NameProtect peddles their “online brand monitoring” to unsuspecting and gullible companies
# looking for people to sue. Despite the claims on their robot information page, they do not
# respect robots.txt; in fact, they spoof their User-Agent in multiple ways to avoid detection.
# I have banned them by User-Agent and IP address.
RewriteCond %{REMOTE_ADDR} ^12\\.148\\.196\\.(12[8-9]¦1[3-9][0-9]¦2[0-4][0-9]¦25[0-5])$ [OR]
RewriteCond %{REMOTE_ADDR} ^12\\.148\\.209\\.(19[2-9]¦2[0-4][0-9]¦25[0-5])$ [OR]
RewriteCond %{HTTP_USER_AGENT} NPBot[NC]
RewriteRule .* - [F,L]

Questions on the comments? Yeah, yeah, I’m sure you don’t have access to the server configuration file but PLEASE work with your host’s staff as this kind of .htaccess is ridiculous and adversely impacts the server you’re assigned to (at least it does if you have any kind of traffic at all).

Regards,

DK

With regards to your question about hotlinking you have these two blocks:


RewriteCond %{REQUEST_FILENAME} .*jpg$|.*jpeg$|.*gif$|.*png$|.*ico$|.*js$|.*css$|.*txt$ [NC]
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(.+\\.)?xyz\\.com/ [NC] [OR]
RewriteCond %{HTTP_REFERER} !^http://www.xyz\\.com/.*$ [NC] [OR]
RewriteRule (.*) /nohotlink.php?pic=$1 [R=302,L]

and


RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://xyz.com/.*$ [NC] [OR]
RewriteCond %{HTTP_REFERER} !^http://www.xyz.com/.*$ [NC] [OR]
RewriteRule .*\\.(gif|GIF|jpg|JPG|png|PNG|bmp|BMP|wav|mp3|wmv|avi|mpeg)$ - [F][/COLOR]

Yes, those are basically the same. The difference is that the blocks match different file types.
For example the first one blocks .txt files while the second doesn’t.
On the other hand the second one blocks .mp3 files which the first one doesn’t.
So, make up your mind on which files you want to block, insert them in the second block (that’s slightly better than the first one because it doesn’t contain all these :redhot: .* :redhot: greedy regex – what made you think you needed those in the first place? :shifty:), and don’t put all different cases it can be written in there (gif, GIF, Gif, etc) but just use lowercase and put [NC] at the end of the rule. Okay there already is [F] at the end, so make that [F,NC].

Also,


# Deny access by user agents
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Bot\\ mailto:craftbot@yahoo.com [OR] 
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Custo [OR] 
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Download\\ Demon [OR] 
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR] 
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR] 
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR] 
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Express\\ WebPictures [OR] 
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR] 
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR] 
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR] 
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR] 
RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR] 
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR] 
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR] 
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^Image\\ Stripper [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Image\\ Sucker [OR] 
RewriteCond %{HTTP_USER_AGENT} Indy\\ Library [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Internet\\ Ninja [OR] 
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR] 
RewriteCond %{HTTP_USER_AGENT} ^JOC\\ Web\\ Spider [OR] 
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR] 
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Mass\\ Downloader [OR] 
RewriteCond %{HTTP_USER_AGENT} ^MIDown\\ tool [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Mister\\ PiX [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR] 
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR] 
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR] 
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Net\\ Vampire [OR] 
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Offline\\ Explorer [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Offline\\ Navigator [OR] 
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Papa\\ Foto [OR] 
RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR] 
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR] 
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR] 
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR] 
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR] 
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR] 
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR] 
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR] 
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Teleport\\ Pro [OR] 
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Web\\ Image\\ Collector [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Web\\ Sucker [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebGo\\ IS [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Website\\ eXtractor [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Website\\ Quester [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\\ WebSpider [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Zeus 
RewriteRule ^.* - [F]

Can be replaced with:


# deny access by user agents
RewriteCond ^(blackwidow|bot\\ mailto:craftbot@yahoo.com|chinaclaw|custo|disco|download\\ demon|ecatch|eirgrabber|emailsiphon|emailwolf|express\\ webpictures|extractorpro|eyenetie|flashget|getright|getweb!|go!zilla|go-ahead-got-it|grabnet|grafula|hmview|rewritecond %{http_user_agent} httrack|image\\ stripper|image\\ sucker|rewritecond %{http_user_agent} indy\\ library|interget|internet\\ ninja|jetcar|joc\\ web\\ spider|larbin|leechftp|mass\\ downloader|midown\\ tool|mister\\ pix|navroad|nearsite|netants|netspider|net\\ vampire|netzip|octopus|offline\\ explorer|offline\\ navigator|pagegrabber|papa\\ foto|pavuk|pcbrowser|realdownload|reget|sitesnagger|smartdownload|superbot|superhttp|surfbot|takeout|teleport\\ pro|voideye|web\\ image\\ collector|web\\ sucker|webauto|webcopier|webfetch|webgo\\ is|webleacher|webreaper|websauger|website\\ extractor|website\\ quester|webstripper|webwhacker|webzip|wget|widow|wwwoffle|xaldon\\ webspider|zeus) [NC]
RewriteRule . - [F]

See what I did there? :wink:

To be brutally honest, if I were you I’d throw this whole monster out the window, think about what you really want the .htaccess to do, and write only that. And stop copying random snippets you find on the internet and putting them in there (the <IfModule prefork.c></IfModule> and <IfModule worker.c></IfModule> sections are proof of that, since those are [I]only[/I] allowed in the server config won’t do anything in a .htaccess file!!)

Wow, so much to improve, I wonder how it worked all this time? :slight_smile:

Server speed was actually very good, and the hosting people even saw the htaccess file a few times but never complained.

I’m going through the comments and trying to come out with a new enhanced lean version. So far so good except for an issue here or there, i.e:

deny access by user agents

RewriteCond ^(blackwidow|bot\\ mailto:craftbot@yahoo.com|chinaclaw|custo|disco|download\\ demon|ecatch|eirgrabber|emailsiphon|emailwolf|express\\ webpictures|extractorpro|eyenetie|flashget|getright|getweb!|go!zilla|go-ahead-got-it|grabnet|grafula|hmview|rewritecond %{http_user_agent} httrack|image\\ stripper|image\\ sucker|rewritecond %{http_user_agent} indy\\ library|interget|internet\\ ninja|jetcar|joc\\ web\\ spider|larbin|leechftp|mass\\ downloader|midown\\ tool|mister\\ pix|navroad|nearsite|netants|netspider|net\\ vampire|netzip|octopus|offline\\ explorer|offline\\ navigator|pagegrabber|papa\\ foto|pavuk|pcbrowser|realdownload|reget|sitesnagger|smartdownload|superbot|superhttp|surfbot|takeout|teleport\\ pro|voideye|web\\ image\\ collector|web\\ sucker|webauto|webcopier|webfetch|webgo\\ is|webleacher|webreaper|websauger|website\\ extractor|website\\ quester|webstripper|webwhacker|webzip|wget|widow|wwwoffle|xaldon\\ webspider|zeus) [NC]
RewriteRule . - [F]

That causes an internal server error. It’s not the spaces (I tried fixinng those) so it’s likely a \ or / it seems. Time will tell.

Also, on the “deny stealing content” issue, I believe to have improved the code relating to hot linking to media files on the website but aren’t entirely sure whether this will also prevent it appearing in search engine results (robots.txt folder exclusion aside)? In other words, is the below code still necessary?

Deny showing images in search engine results (in addition to robots.txt)

RewriteCond %{HTTP_REFERER} !^http(s)?://([-a-z0-9]+\\.)?xyz.com/ [NC] [OR]
RewriteCond %{HTTP_REFERER} !^http(s)?://(.+\\.)?xyz.com/.*$ [NC] 
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !google. [NC]
RewriteCond %{HTTP_REFERER} !search?q=cache [NC]
RewriteCond %{HTTP_REFERER} !msn. [NC]
RewriteCond %{HTTP_REFERER} !yahoo. [NC]
RewriteRule .*\\.(gif|jpg|png|bmp|wav|mp3|wmv|avi|mpeg)$ - [F,NC]

Thanks,

It’s probably the two occurrences of “rewritecond %{http_user_agent}” I left in there :blush: Remove those and you should be fine :slight_smile:

No I’m pretty sure search engines will still be able to find and index those files. All the code does is prevent hotlinking.

As for the extensionless links, it just looks professional without the file extension. It’s also somewhat safer as a hacker first needs to find out what type of web language the site is written in.

I’ll see what my host says about moving the code into the vhosts config file.

Yes, they still have PHP4 as default. Not sure what will happen once they roll-out PHP6.

So, you still concur that 301’s should be left in a single htaccess file? I.e. no better way of doing it (speed wise)?

Thanks,

Okay so I believe to have taken most of your comments onboard and as such I’ve come up with the below leaner version of htaccess file.

Checked both on localhost and online and as far as I can tell all is well. On localhost I have to remove the lines in red for Apache (otherwise the Apache config in WAMP) to not cause an internal server error. Online meanwhile I need to force PHP5 processing.

If the site was slow as you guys reckon (even though the server didn’t sweat and there is traffic btw) then give it a few weeks and the site’s bounce rate should go down, simply because it should load faster for many people.

# Enable rewrite
RewriteEngine On
RewriteRule ^([0-9]+)\\.([0-9]+)\\.([0-9]+)\\.([0-9]+)\\.([0-9]+)\\.([0-9]+)\\.phtml$ index.php?id=$1,$2,$3,$4,$5,$6
RewriteRule ^(.+)\\.phtml$ index.php?$1 [L]

[COLOR="Red"]# Force PHP5
AddType x-mapp-php5 .php

# Deny hot linking images
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://example.com/.*$ [NC] [OR]
RewriteCond %{HTTP_REFERER} !^http://www.example.com/.*$ [NC] [OR]
RewriteRule .*\\.(gif|jpg|png|bmp|wav|wmv|avi|mpeg)$ - [F,NC]

# Deny showing images in search engine results (in addition to robots.txt)
RewriteCond %{HTTP_REFERER} !^http(s)?://([-a-z0-9]+\\.)?example.com/ [NC] [OR]
RewriteCond %{HTTP_REFERER} !^http(s)?://(.+\\.)?example.com/.*$ [NC] 
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !google. [NC]
RewriteCond %{HTTP_REFERER} !search?q=cache [NC]
RewriteCond %{HTTP_REFERER} !msn. [NC]
RewriteCond %{HTTP_REFERER} !yahoo. [NC]
RewriteRule .*\\.(gif|jpg|png|bmp|wav|mp3|wmv|avi|mpeg)$ - [F,NC]

# Do not automatically corect simple speling errors within links performance tweak
CheckSpelling Off

# Do not generate expiration header performance tweak
FileETag None

# Improve site page rank
RewriteCond %{HTTP_HOST} ^example\\.com [NC]
RewriteRule ^(.*)$ http://www.example\\.com/$1 [R=permanent]

# Media files - 7 days
<FilesMatch "\\.(ico|pdf|flv|jpg|jpeg|png|gif|swf|mp3|mp4)$">
Header set Cache-Control "max-age=302400, must-revalidate, public, no-transform"
</FilesMatch>
 
# HTML etc. files - 2 hours
<FilesMatch "\\.(html|htm|xml|txt|xsl)$">
Header set Cache-Control "max-age=7200, must-revalidate, public, no-transform"
</FilesMatch>
  
# JS/CSS files - 1 day
<FilesMatch "\\.(js|css)$">
Header set Cache-Control "max-age=43200, must-revalidate, public, no-transform"
</FilesMatch>

# PHP etc. dynamic files - disabled
<FilesMatch "\\.(pl|php|[sf]?cgi|spl)$">
Header set Cache-Control: "max-age=0, no-store"
</FilesMatch>

# If index or index.php requested, strip and redirect
RewriteCond %{THE_REQUEST} index(\\.php)?
RewriteRule ^index(\\.php)?$ http://www.example.com/ [R=301,L][/COLOR]

#  Extensionless (clear URL) links / redirect and hide directory listings / allow mod_rewrite to work
Options -MultiViews -Indexes FollowSymLinks

# Hide files in directory listing
IndexIgnore *

# Pass Through the empty request (to be handled as DirectoryIndex)
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\\ /?\\ HTTP
RewriteRule .? - [PT]

# Remove PHP extension from links (internally)
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule ^(.+)$ /$1.php [L]

# Remove PHP extension from links (externally)
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\\ /([^.]+\\.)+php\\ HTTP
RewriteRule ^(.+)\\.php$ /$1 [R=301,L]

# Deny access to .htaccess and .ini files
<Files "\\.(htaccess|ini)$">
order allow,deny
deny from all
</Files>

# Index page order
DirectoryIndex index.php index.html index.shtml index.htm

# Error pages handling
ErrorDocument 400 /error/400.php
ErrorDocument 401 /error/401.php
ErrorDocument 403 /error/403.php
ErrorDocument 404 /error/404.php
ErrorDocument 500 /error/500.php
ErrorDocument 501 /error/501.php
ErrorDocument 502 /error/502.php

# Eliminate Code Red and NIMDA Virus attacks
redirect /scripts http://www.stoptheviruscold.invalid
redirect /MSADC http://www.stoptheviruscold.invalid
redirect /c http://www.stoptheviruscold.invalid
redirect /d http://www.stoptheviruscold.invalid
redirect /_mem_bin http://stoptheviruscold.invalid
redirect /msadc http://stoptheviruscold.invalid
RedirectMatch (.*)\\cmd.exe$ http://stoptheviruscold.invalid$1 

# Deny access by user agents
RewriteCond ^(blackwidow|bot\\ mailto:craftbot@yahoo.com|chinaclaw|custo|disco|download\\ demon|ecatch|eirgrabber|emailsiphon|emailwolf|express\\ webpictures|extractorpro|eyenetie|flashget|getright|getweb!|go!zilla|go-ahead-got-it|grabnet|grafula|hmview|httrack|image\\ stripper|image\\ sucker|indy\\ library|interget|internet\\ ninja|jetcar|joc\\ web\\ spider|larbin|leechftp|mass\\ downloader|midown\\ tool|mister\\ pix|navroad|nearsite|netants|netspider|net\\ vampire|netzip|octopus|offline\\ explorer|offline\\ navigator|pagegrabber|papa\\ foto|pavuk|pcbrowser|realdownload|reget|sitesnagger|smartdownload|superbot|superhttp|surfbot|takeout|teleport\\ pro|voideye|web\\ image\\ collector|web\\ sucker|webauto|webcopier|webfetch|webgo\\ is|webleacher|webreaper|websauger|website\\ extractor|website\\ quester|webstripper|webwhacker|webzip|wget|widow|wwwoffle|xaldon\\ webspider|zeus) [NC]
RewriteRule . - [F]

# NameProtect peddles their “online brand monitoring” to unsuspecting and gullible companies
# looking for people to sue. Despite the claims on their robot information page, they do not
# respect robots.txt; in fact, they spoof their User-Agent in multiple ways to avoid detection.
# I have banned them by User-Agent and IP address.
RewriteCond %{REMOTE_ADDR} ^12\\.148\\.196\\.(12[8-9]¦1[3-9][0-9]¦2[0-4][0-9]¦25[0-5])$ [OR]
RewriteCond %{REMOTE_ADDR} ^12\\.148\\.209\\.(19[2-9]¦2[0-4][0-9]¦25[0-5])$ [OR]
RewriteCond %{HTTP_USER_AGENT} NPBot[NC]
RewriteRule .* - [F,L]

See, that does look a lot better than what you started with doesn’t it?

One last remark is that IndexIgnore * is redundant because you also have Options -Indexes in there.

Other than that, looks like it’s good to go :tup:

nsm,

Comments embedded (again):

# Enable rewrite
RewriteEngine On
RewriteRule ^([0-9]+)\\.([0-9]+)\\.([0-9]+)\\.([0-9]+)\\.([0-9]+)\\.([0-9]+)\\.phtml$ index.php?id=$1,$2,$3,$4,$5,$6
RewriteRule ^(.+)\\.phtml$ index.php?$1 [L]

# Force PHP5
AddType x-mapp-php5 .php
[indent]This should NOT be in .htaccess, it should be in httpd.conf for your server.  I suspect that this is a MAPP (canned program)-induced problem.[/indent]
# Deny hot linking images
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\\.)?example.com/.*$ [NC] [OR]
[COLOR="Gray"]RewriteCond %{HTTP_REFERER} !^http://www.example.com/.*$ [NC] [OR][/COLOR]
RewriteRule .*\\.(gif|jpg|png|bmp|wav|wmv|avi|mpeg)$ - [F,NC]

# Deny showing images in search engine results (in addition to robots.txt)
[COLOR="Gray"]RewriteCond %{HTTP_REFERER} !^http(s)?://([-a-z0-9]+\\.)?example.com/ [NC] [OR][/COLOR]
RewriteCond %{HTTP_REFERER} !^http(s)?://(.+\\.)?example.com/.*$ [NC] 
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !google. [NC]
RewriteCond %{HTTP_REFERER} !search?q=cache [NC]
RewriteCond %{HTTP_REFERER} !msn. [NC]
RewriteCond %{HTTP_REFERER} !yahoo. [NC]
RewriteRule .[SIZE="6"]+[/SIZE]\\.(gif|jpg|png|bmp|wav|mp3|wmv|avi|mpeg)$ - [F,NC]

# Do not automatically corect simple speling errors within links performance tweak
CheckSpelling Off

# Do not generate expiration header performance tweak
FileETag None

# Improve site page rank
RewriteCond %{HTTP_HOST} ^example\\.com [NC]
[COLOR="Gray"]RewriteRule ^(.*)$ http://www.example\\.com/$1 [R=permanent][/COLOR]
RewriteRule .? http://www.example\\.com%{REQUEST_URI} [R=301]

# Media files - 7 days
<FilesMatch "\\.(ico|pdf|flv|jpg|jpeg|png|gif|swf|mp3|mp4)$">
Header set Cache-Control "max-age=302400, must-revalidate, public, no-transform"
</FilesMatch>
 
# HTML etc. files - 2 hours
<FilesMatch "\\.(html|htm|xml|txt|xsl)$">
Header set Cache-Control "max-age=7200, must-revalidate, public, no-transform"
</FilesMatch>
  
# JS/CSS files - 1 day
<FilesMatch "\\.(js|css)$">
Header set Cache-Control "max-age=43200, must-revalidate, public, no-transform"
</FilesMatch>

# PHP etc. dynamic files - disabled
<FilesMatch "\\.(pl|php|[sf]?cgi|spl)$">
Header set Cache-Control: "max-age=0, no-store"
</FilesMatch>

# If index or index.php requested, strip and redirect
RewriteCond %{THE_REQUEST} index(\\.php)?
RewriteRule ^index(\\.php)?$ http://www.example.com/ [R=301,L]
[indent]ARGH![/indent]

#  Extensionless (clear URL) links / redirect and hide directory listings / allow mod_rewrite to work
Options -MultiViews -Indexes FollowSymLinks

# Hide files in directory listing
[COLOR="Gray"]IndexIgnore *[/COLOR]

# Pass Through the empty request (to be handled as DirectoryIndex)
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\\ /[COLOR="Gray"]?[/COLOR]\\ HTTP
RewriteRule .? - [PT]

# [SIZE="4"]ADD[/SIZE] PHP extension [SIZE="4"]TO[/SIZE] links (internally)
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}.php -f
[COLOR="Gray"]RewriteRule ^(.+)$ /$1.php [L][/COLOR]
RewriteRule ^([a-z]+)$ [COLOR="Gray"]/[/COLOR]$1.php [L]

# Remove PHP extension from links (externally)
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\\ /([^.]+\\.)+php\\ HTTP
RewriteRule ^(.+)\\.php$ /$1 [R=301,L]

# Deny access to .htaccess and .ini files
<Files "\\.(htaccess|ini)$">
order allow,deny
deny from all
</Files>

# Index page order
DirectoryIndex index.php index.html index.shtml index.htm

# Error pages handling
ErrorDocument 400 /error/400.php
ErrorDocument 401 /error/401.php
ErrorDocument 403 /error/403.php
ErrorDocument 404 /error/404.php
ErrorDocument 500 /error/500.php
ErrorDocument 501 /error/501.php
ErrorDocument 502 /error/502.php

# Eliminate Code Red and NIMDA Virus attacks
redirect /scripts http://www.stoptheviruscold.invalid
redirect /MSADC http://www.stoptheviruscold.invalid
redirect /c http://www.stoptheviruscold.invalid
redirect /d http://www.stoptheviruscold.invalid
redirect /_mem_bin http://stoptheviruscold.invalid
redirect /msadc http://stoptheviruscold.invalid
RedirectMatch (.*)\\cmd.exe$ http://stoptheviruscold.invalid$1 

# Deny access by user agents
RewriteCond ^(blackwidow|bot\\ mailto:craftbot@yahoo.com|chinaclaw|custo|disco|download\\ demon|ecatch|eirgrabber|emailsiphon|emailwolf|express\\ webpictures|extractorpro|eyenetie|flashget|getright|getweb!|go!zilla|go-ahead-got-it|grabnet|grafula|hmview|httrack|image\\ stripper|image\\ sucker|indy\\ library|interget|internet\\ ninja|jetcar|joc\\ web\\ spider|larbin|leechftp|mass\\ downloader|midown\\ tool|mister\\ pix|navroad|nearsite|netants|netspider|net\\ vampire|netzip|octopus|offline\\ explorer|offline\\ navigator|pagegrabber|papa\\ foto|pavuk|pcbrowser|realdownload|reget|sitesnagger|smartdownload|superbot|superhttp|surfbot|takeout|teleport\\ pro|voideye|web\\ image\\ collector|web\\ sucker|webauto|webcopier|webfetch|webgo\\ is|webleacher|webreaper|websauger|website\\ extractor|website\\ quester|webstripper|webwhacker|webzip|wget|widow|wwwoffle|xaldon\\ webspider|zeus) [NC]
RewriteRule . - [F]

# NameProtect peddles their “online brand monitoring” to unsuspecting and gullible companies
# looking for people to sue. Despite the claims on their robot information page, they do not
# respect robots.txt; in fact, they spoof their User-Agent in multiple ways to avoid detection.
# I have banned them by User-Agent and IP address.
RewriteCond %{REMOTE_ADDR} ^12\\.148\\.196\\.(12[8-9]¦1[3-9][0-9]¦2[0-4][0-9]¦25[0-5])$ [OR]
RewriteCond %{REMOTE_ADDR} ^12\\.148\\.209\\.(19[2-9]¦2[0-4][0-9]¦25[0-5])$ [OR]
RewriteCond %{HTTP_USER_AGENT} NPBot[NC]
RewriteRule .* - [F,L]

IMHO, still a lot of extraneous stuff in there making this .htaccess far too large to load, parse and execute repeatedly (loopy code requires multiple passes) FOR EVERY REQUEST. IMHO, VERY BAD FOR A WEBMASTER TO BE DOING.

Regards,

DK

Not necessarily. Some hosts support both PHP4 and PHP5 and have set .php to the PHP4 mime type by default, and allow you to switch to PHP5 by using this directive (thus overriding the default mime type defined in httpd.conf for PHP4). Sounds to me like the OP is one of these hosts.

Thanks for your comments. Force PHP5 has to be in there because of my host, they’ve kept PHP4 alive.

As for other multiple loops. What makes you say argh about the /index to domain name redirect? What’s so costly (processing wise) there? The extensionless redirects seem more costly. These have to be expanded in the way they are for my host to work. In fact, I believe you helped form the code in another one of my forum threads (would have to check but pretty sure).

I get the impression you would throw a lot into the httpd.conf. Given I don’t have access to it, is it really such foul play to go putting what I do into the .htaccess? There’s a lot of 301’s I’ve not pasted but in the htaccess file also, will this make you jump out and say I should put a seperate htaccess in each folder from where a redirect is called from? :slight_smile:

Thanks,

nsm,

You NEED to force PHP5? In that case, it’s time to go looking for a new host (or get your host to join the 21st Century). Good reason, though, for your code.

I’m NOT a fan of removing the DirectoryIndex from the URI only to make Apache add it back when it finds the file part of the path/to/file empty. That’s loopy, IMHO, but you have done it correctly. Same goes for removing the file extension of php files only to add it back in a hidden manner. These aversions are because YOU control the links which should be written without the DirectoryIndex and without the .php file extension. If others want to force seeing them (or are using old links), “why bother” is the question I’d have to ask. Oh, yeah, I’ve answered this many times but still help members who have their hearts set on doing what I consider “silly things.” It’s just my personal preference, nothing personal intended.

With all the things you have put in your .htaccess, you’d increase server performance if your host would move your code to your httpd-vhosts.conf (because they should NOT move it to the httpd.conf as that would affect everyone on the shared server). They should JUMP at the chance to do that once your code is “clean” (and you’re getting there). The reason to move as much as possible to the server (or vhost) configuration file is that those are only read once. EVERY .htaccess in the path must be found, read, parsed and the code run through (in a far less efficient manner, too) for EVERY request (yes, even for image, css, js, ALL requests!). Obviously, using .htaccess is far more effort for the server.

I hope that answers your questions (about my comments).

Regards,

DK

nsm,

I agree that extensionless links look better but is it worth it to STRIP the extension to do it? After all, YOU control how your links are created. Okay, relatively minor point but it does get a bit worse for those on hosts who force Apache to display the DirectoryIndex filenames.

Your host SHOULD appreciate the opportunity to move code to your vhosts config file. If not, that’s another reason to find a good host (I thought PHP4 had been depricated at least a year ago).

Actually, EVERYTHING should be in the vhosts config file - so long as you’re not changing it frequently. Redirects are much faster because they’re part of Apache’s core but, if you have no better way to give Apache these directives, use .htaccess - but know it’s a webmaster’s “last resort” (except for testing).

Regards,

DK

Thanks again.

Yes, you’re right, it would be best just to use extensionless file names from the get go instead of forcing them at runtime except for the fact that it’s not something I envisioned from when the site launched. Going back and deleting .php from each is likely to open up linking problems given the number of pages and internal links.

Apart from that I like to see the file extension when I view a list of files in a folder. Perhaps I’m oldschool, nonetheless habits aren’t that easy to change.

nsm,

Are all the pages hardcoded? When I write my websites, I will put the navigation in a script to be included when and where I want it on the displayed page. That means I only have ONE script to update. A nice multifile search and replace also works wonders.

As for seeing the .php extension, all your scripts SHOULD have that extension! DO NOT REMOVE THE EXTENSION FROM THE FILES - ONLY FROM THE LINKS TO THOSE FILES! That way, you can rely on mod_rewrite to request the file with a .php extension (after checking %{REQUEST_FILENAME}.php -f, of course).

If you’re using PHP, that is one old habit (pure .html files) you’ve already broken. Use PHP to your advantage and it will go a long way to “standardize” your web pages and make your life much easier.

Regards,

DK

The navigation menu is within its own file which gets added on each page via a PHP include_once(). The same goes for a bunch of other functionality. The main links within the site are stored in variables which get also added via include_once(). For these I just change the variable value and all the links change automatically. It’s the in body <a>'s I’m more concerned about because there’s quite a lot of it. It’s part of my on-page SEO efforts (i.e. once you get the page rank within your side, keep it within your side via nofollow external links but do dofollow links internally within your side with the right anchor text to boost your rankings in SERP’s). As far as I’m aware it’s one of the most overlooked aspects by people working on a site’s SEO, hence why they’re there. I could do a “replace all” in notepad++ and it “should work” but I wouldn’t be sure without extensive localhost testing. Given the number of pages and thus links this won’t be a 5 minute job, and I’m bound to miss/overlook something simply due to human nature. Maybe I’m fretting too much however I tend to have a pragmatic mindset, i.e. think ahead of the consequences first and foremost.

Thanks again.

nsm,

It seems as if you NEED to use “Loopy Code” for those internal redirects - assuming you’re concerned about them (I use the philosophy that “if it ain’t broke, (I) don’t fix it”). What you’re after is found in my signature’s tutorial as “Redirect To the New Format” (http://datakoncepts.com/seo#example-12).

Regards,

DK