SitePoint Sponsor

User Tag List

Results 1 to 7 of 7
  1. #1
    SitePoint Member
    Join Date
    May 2009
    Posts
    4
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    How to block Unknown robot (identified by 'bot*')

    These eat up most the bandwidth on my site. I also have a problem with site speed at times and content stealers (scrappers). How do I write this in htaccess exactly to block all of these: Unknown robot (identified by 'bot*')?

  2. #2
    Avid Logophile silver trophy
    ParkinT's Avatar
    Join Date
    May 2006
    Location
    Central Florida
    Posts
    2,332
    Mentioned
    192 Post(s)
    Tagged
    4 Thread(s)
    Can you share a portion of your logs in order for us to get a better understanding of the problem?
    Don't be yourself. Be someone a little nicer. -Mignon McLaughlin, journalist and author (1913-1983)


    Git is for EVERYONE
    Literally, the best app for readers.
    Make Your P@ssw0rd Secure
    Leveraging SubDomains

  3. #3
    SitePoint Member
    Join Date
    May 2009
    Posts
    4
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    What portion of the logs do you need?

  4. #4
    Certified Ethical Hacker silver trophybronze trophy dklynn's Avatar
    Join Date
    Feb 2002
    Location
    Auckland
    Posts
    14,653
    Mentioned
    19 Post(s)
    Tagged
    3 Thread(s)
    Dave,

    You might benefit from reading the mod_rewrite tutorial linked in my signature as it contains explanations and sample code. It's helped may members and should help you, too. It has examples just for this situation. THEN, if you have questions, please come back and I'll help you get the code exactly right.

    Regards,

    DK
    David K. Lynn - Data Koncepts is a long-time WebHostingBuzz (US/UK)
    Client and (unpaid) WHB Ambassador
    mod_rewrite Tutorial Article (setup, config, test & write
    mod_rewrite regex w/sample code) and Code Generator

  5. #5
    SitePoint Member
    Join Date
    May 2009
    Posts
    4
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    That page seemed to be more about SEO and url redirects I had trouble finding how to block them exactly in Robot.txt and htaccess. Does anyone know a simple answer on how to block Unknown robot (identified by 'bot*') by sitemap and htaccess?

  6. #6
    Certified Ethical Hacker silver trophybronze trophy dklynn's Avatar
    Join Date
    Feb 2002
    Location
    Auckland
    Posts
    14,653
    Mentioned
    19 Post(s)
    Tagged
    3 Thread(s)
    Dave,

    Sorry, I thought I had an example of how not to abuse a server with the typically long list of bots to reject (with sample code). My error: mea culpa.

    The {HTTP_USER} (I think that's the Apache variable) is notoriously unreliable but you can use a RewriteCond to test for bot\* then Fail any request in the subsequent RewriteRule.

    Regards,

    DK
    David K. Lynn - Data Koncepts is a long-time WebHostingBuzz (US/UK)
    Client and (unpaid) WHB Ambassador
    mod_rewrite Tutorial Article (setup, config, test & write
    mod_rewrite regex w/sample code) and Code Generator

  7. #7
    SitePoint Wizard bronze trophy Jeff Mott's Avatar
    Join Date
    Jul 2009
    Posts
    1,269
    Mentioned
    18 Post(s)
    Tagged
    0 Thread(s)
    Fortunately the Apache documentation covers this exact scenario.

    http://httpd.apache.org/docs/2.4/rew...king-of-robots
    "First make it work. Then make it better."


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •