.htaccess Modification to Block User Agent?

From viewing my November stats, there’s a user agent – libwww-perl/5* – that accounts for exorbitant bandwidth usage on my website on four separate days in November.

Because I’m not really a coder, the more I read, the more confused I am on exactly what code I should add to my .htaccess file in order to block that user agent.

  1. Could someone “in the know” show me the exact line(s) of code that I should insert into my .htaccess file in order to prevent the libwww-perl/5* user agent from accessing my site?

  2. And does it matter where that new code is positioned, in relation to existing lines of code in the .htaccess file?

This is probably very simple, but it’s just not my forte. :confused:

Thanks so much.

Deb Phillips

You are right to be concerned about messing with your htaccess file. A simple mistake can cause problems including bringing your site down.
Number one rule, always save a back-up copy.
You should do a search for the user-agent string to be certain you really want to block it. Some legitimate user-agents (w3c, about, etc) include “libwww-perl” in them. So make sure your user-agent string is as specific as is possible. Also realize that the user-agent string can be spoofed so this technique is not 100% fool-proof. But it should help some.
I would try adding something like this to your existing htaccess file
(depending on whether or not RewriteEngine is already on or not, and where you want to send them)

   #optional comment to remind you what you're doing
   RewriteEngine on
   RewriteCond %{HTTP_USER_AGENT} ^First_string_here [OR]
   RewriteCond %{HTTP_USER_AGENT} ^Another_string_here [OR]
   RewriteCond %{HTTP_USER_AGENT} ^Last_string_here
   RewriteRule ^(.*)$ http://your.domain.com/custom_error_page.html

Also keep in mind that the htacces file is proceesed for every HTTP request, so you don’t want to let it get too big.

Thank you for taking time to respond to my inquiry, Mittineague – and for the recommendations and precautions. Please forgive my ignorance, but I need to ask a few more questions. Again, I’m not a coder :slight_smile: (which will be very obvious, I’m sure)! Based on your example code:

1) I’m not sure what RewriteEngine does, but it’s currently not mentioned in my .htaccess file, and I presume that means it’s not ON. Is that correct? Is there a reason why I might not want to use it? All that’s currently in my .htaccess file is as follows:

Options -Indexes
php_value error_reporting 7
pho_flag display_errors off
ErrorDocument 404 /notfound.php


2)
Do the variable “^…string…” references at the end of the three “RewriteCond…” lines of code in your example represent where I would insert the user agent name?

3) Am I correct in assuming that, if I’m only wanting to block one user agent, the following would work?

#optional comment to remind me what I'm doing
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^UserAgentNameHere
RewriteRule ^(.*)$ http://my.domain.com/custom_error_page.html

4) Does it matter where this new code is located in the .htaccess file, in relation to the existing code in the file?

4) Someone suggested I use the following code. Is there a drawback to it, or are both routes equally viable:

RewriteEngine on
Set EnvIf User-Agent ^nameofbot bad_bot=1
deny from env=bad_bot

Sorry to draw this out so. Once again, thank you, Mittineague.

1: It may or may not be on depending on your httpd.conf settings. (Apache reads the httpd.conf file when it starts, but many site owners can’t edit it, ie. on a shared host) So declaring “RewriteEngine on” in htaccess turns it on if it’s off.
2: Yes, that’s what I meant. Replace “string_here” with the user-agent you want to block.
3: Yes, you only need [OR] if there’s more than one.
4: I don’t see any conflict with putting it after the existing lines. But if your htaccess grows in complexity, the sequence of lines may then become important.
5:

Set EnvIf User-Agent ^nameofbot bad_bot=1
deny from env=bad_bot

This technique uses a different Apache module. RewriteCond and RewriteRule use mod_rewrite, and SetEnvIf (no space) uses mod_setenvif. As long as you have that module, it is an alternative. I can’t say I know what difference there would be perfomance-wise, but using SetEnvIf will give the user a “forbidden” while using Rewrite allows you to send them somewhere.

I appreciate your help so much, Mitteneaque. I’ll see what I can do now on this issue. My sincerest thanks…Deb