Hi Contrid,
The most effective way to block spammer/bad bots is to use a firewall that supports intrusion detection, DOS attacks and host filtering.
However you may not have access to such a beast so you can do some things with your .htaccess code.
You can build a list of bad HTTP_REFERERs each being like:
Code:
/* check if the referer is a given bad domain */
RewriteCond %{HTTP_REFERER} baddomain1\.com [OR,NC]
RewriteCond %{HTTP_REFERER} baddomain2\.com [NC]
/* block them */
RewriteRule .? - [F,L]
You can filter traffic from certain countries is covered nicely here: http://www.sitepoint.com/forums/show...ht=bad+traffic
You can block bad robots except the ones you want. This is hit or miss as bots don't always get caught but it can help:
Code:
/* If it is the robots text then skip */
RewriteRule ^robots\.txt - [S=1]
/* If any of these AGENTS and not the search engines then block */
RewriteCond %{HTTP_USER_AGENT} ^$ [OR,NC]
RewriteCond %{HTTP_USER_AGENT} spider [OR,NC]
RewriteCond %{HTTP_USER_AGENT} crawl [OR,NC]
RewriteCond %{HTTP_USER_AGENT} bot [NC]
RewriteCond %{HTTP_USER_AGENT} !(googlebot|bingbot|msnbot) [NC]
RewriteRule .? - [F]
A great article with good advice can be found here: http://en.linuxreviews.org/HOWTO_sto...sing_.htaccess
As your original quest is to use 410 or a 301, do a 410; eventually the search engines will learn to not index this page and crawl it.
Hope this helps,
Steve
Bookmarks