SitePoint Sponsor

User Tag List

Results 1 to 3 of 3
  1. #1
    SitePoint Member
    Join Date
    May 2009
    0 Post(s)
    0 Thread(s)

    Blocking Bots via htaccess Question

    Hi, I have two questions (please don't laugh if they seem very basic). It's about the alphabetical order in the .htaccess file and the difficences between ^ and without the ^ when it comes to blocking bots/user agents?

    What is the differrence between: RewriteCond %{HTTP_USER_AGENT} ^Zeus [OR] and just RewriteCond %{HTTP_USER_AGENT} Zeus?

    Will the second one be more thorough with blocking a bot by name? What does this ^ and the [OR] mean and are they necessary?

    Question 2: I often read about many stating to post these blocks in alphabetical order with in the .htaccess file. Is this necessary (will it cause problems if it's not in order).

    For example, these are some of the bots I'm blocking (the last 4 aren't in alpabetical order, will this cause a problem for the site if they're not listed in precise alphabetical order):

    RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Xenu [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Xara [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Y!TunnelPro [OR]
    RewriteCond %{HTTP_USER_AGENT} ^YahooYSMcm [OR]
    RewriteCond %{HTTP_USER_AGENT} ^YandexBot [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Zade [OR]
    RewriteCond %{HTTP_USER_AGENT} ^ZBot [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Zeus [OR]
    RewriteCond %{HTTP_USER_AGENT} ^zerxbot
    RewriteCond %{HTTP_USER_AGENT} MJ12bot
    RewriteCond %{HTTP_USER_AGENT} Linguee
    RewriteCond %{HTTP_USER_AGENT} SolomonoBot
    RewriteCond %{HTTP_USER_AGENT} Lightspeedsystems

    Note last 4 aren't in order. Thanks everyone that can help, but if at all possible, please answer here and not redirect me to another site with several pages I have to scrub though to find the answer instead. I hoping one of the gurus here already know the answer to these?

  2. #2
    SitePoint Wizard bronze trophy Jeff Mott's Avatar
    Join Date
    Jul 2009
    19 Post(s)
    1 Thread(s)
    The ^ character means match the beginning of the string, so "ZeusBot" would match, but "Yada ZeusBot" would not (because "Zeus" isn't at the start of the string). Whether or not it's better depends on what you're trying to match. Without ^ you'll match more strings, which might mean it's more thorough, or it might mean you'll match false positives.

    Rewrite conditions are implicitly ANDed together. The [OR] changes that so that they're ORed together.

    I can't think of any reason why the bots being list alphabetically would make any difference, except to you, to make it easier for you to scan the list and find a name.
    "First make it work. Then make it better."

  3. #3
    Certified Ethical Hacker silver trophybronze trophy dklynn's Avatar
    Join Date
    Feb 2002
    19 Post(s)
    3 Thread(s)
    A quick translation of Jeff's note:

    Quote Originally Posted by Jeff Mott View Post
    Rewrite conditions are implicitly ANDed together. The [OR] changes that so that they're ORed together.
    Because you dropped the [OR] flags with the ^zerxbot test, any prior match must also match MJ12bot, Linguee, SolomonoBot AND Lightspeedsystems. Obviously, that will never be the case so add the [OR] flag to but not including the last RewriteCond statement.

    I'll also add the note that capitalization is critical UNLESS you also use the No Case flag on each of your RewriteCond statements. If someone is smart enough to use one of these bots, they're probably smart enough to change the bot's name's capitalization possibly using CaMeL CaSe - but [OR,NC] will take care of that for you.


    David K. Lynn - Data Koncepts is a long-time WebHostingBuzz (US/UK)
    Client and (unpaid) WHB Ambassador
    mod_rewrite Tutorial Article (setup, config, test & write
    mod_rewrite regex w/sample code) and Code Generator

Tags for this Thread


Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts