SitePoint Sponsor

User Tag List

Page 4 of 4 FirstFirst 1234
Results 76 to 99 of 99

Thread: Bad Bots Code for Keeping Bad Bots Out Not Working

  1. #76
    SitePoint Evangelist
    Join Date
    Jan 2010
    Posts
    512
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Dave,
    At that time we were using the \*bot and bot*\ method and checking the raw access logs to see why the awstats counts for these bots kept rising. Why would the counts on awstats for bot* and *bot keep rising while the other bot counts have stopped? The count for bot* as of Oct 19 1:32 pm is 695+119, which is higher than all the previous counts:

    236+25, 305+31, 323+34, 363+40, 379+43, 394+50, 513+69, 655+108

    So the reason why I thought bot* and *bot weren't being blocked was the awstats counts for bot* and *bot kept rising, while counts for the other bad bots stopped rising. If they were being blocked with that method why did the awstats counts for bot* and *bot continue to rise?



    Thanks,

    Chris

  2. #77
    SitePoint Evangelist
    Join Date
    Jan 2010
    Posts
    512
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    -continued-
    Dave,
    Real world experience:
    I'm pretty sure my webhost's awstats are behind a couple of days.
    20 Oct 2010 07:24am: 726+135

    Thanks,

    Chris

  3. #78
    Certified Ethical Hacker silver trophybronze trophy dklynn's Avatar
    Join Date
    Feb 2002
    Location
    Auckland
    Posts
    14,315
    Mentioned
    15 Post(s)
    Tagged
    2 Thread(s)
    Chris,

    I've NEVER seen an AWStats report saying 726+135 (or anything like that). WHAT DO THE RAW LOGS SAY ABOUT *bot and bot*? Not there, eh? Working, isn't it?

    Regards,

    DK
    David K. Lynn - Data Koncepts is a long-time WebHostingBuzz (US/UK)
    Client and (unpaid) WHB Ambassador
    Updated mod_rewrite Tutorial Article (setup, config, test & write
    mod_rewrite regex w/sample code) and Code Generator

  4. #79
    SitePoint Evangelist
    Join Date
    Jan 2010
    Posts
    512
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Dave,
    Can you give me an email addres to send you a screen shot of my awstats? Those numbers are a copy and paste. The numbers after the + are the robot's hits on robots.txt

    Thanks,

    Chris

  5. #80
    SitePoint Evangelist
    Join Date
    Jan 2010
    Posts
    512
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    -continued-
    Dave,
    The continuing count increase has dropped - meaning we might be looking at the delay I was talking about -delay from when the code is put in to when you see changes in awstats.
    A copy and paste:
    Unknown robot (identified by 'bot*') 733+137 6.44 MB 21 Oct 2010 - 10:32

    Chris

  6. #81
    SitePoint Evangelist
    Join Date
    Jan 2010
    Posts
    512
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    -continued-
    Dave,

    It just doesn't appear to be working
    "Unknown robot (identified by 'bot*') 770+151 6.74 MB 22 Oct 2010 - 06:00"

    How can the bot be stopped yet the awstats counts for the bot* continue to rise?

    Thanks,

    Chris

  7. #82
    Do. Or do not. There is no try silver trophy
    SitePoint Award Recipient ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    8,347
    Mentioned
    87 Post(s)
    Tagged
    2 Thread(s)
    I can think of two possible answers:

    1) It really isn't working because bot* is AWStat's way of saying "I don't know what this bot is, but it's UA string starts with 'bot'", i.e. the * is a wildcard

    2) It it actually is working, but AWStats is incorrectly counting 403 forbidden as a correct hit.

    Take a look at the RAW output of the http access log to figure out which of the two above is the case.
    Rémon - Hosting Advisor

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy

  8. #83
    SitePoint Evangelist
    Join Date
    Jan 2010
    Posts
    512
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Scallio,
    I'm definitely going to do that.

    First though I'm trying part of Dave's way - change the last line to
    RewriteRule .? - [F,L] and part the original way - change the bot line to
    RewriteCond %{HTTP_USER_AGENT} ^bot*$ .



    Thanks for the help,


    Chris

  9. #84
    SitePoint Evangelist
    Join Date
    Jan 2010
    Posts
    512
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Scallio,
    RewriteCond %{HTTP_USER_AGENT} ^bot*$ .
    didn't work
    Unknown robot (identified by 'bot*') 923+213 7.75 MB 26 Oct 2010 - 07:12
    I'm going to try
    RewriteCond %{HTTP_USER_AGENT} ^bot$ .

    Chris

  10. #85
    Certified Ethical Hacker silver trophybronze trophy dklynn's Avatar
    Join Date
    Feb 2002
    Location
    Auckland
    Posts
    14,315
    Mentioned
    15 Post(s)
    Tagged
    2 Thread(s)
    Chris,

    Have you forgotten about escaping the * character in your regex already?

    Regards,

    DK
    David K. Lynn - Data Koncepts is a long-time WebHostingBuzz (US/UK)
    Client and (unpaid) WHB Ambassador
    Updated mod_rewrite Tutorial Article (setup, config, test & write
    mod_rewrite regex w/sample code) and Code Generator

  11. #86
    Do. Or do not. There is no try silver trophy
    SitePoint Award Recipient ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    8,347
    Mentioned
    87 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by dklynn View Post
    Chris,

    Have you forgotten about escaping the * character in your regex already?

    Regards,

    DK
    David, I think Chris is trying this:

    Quote Originally Posted by ScallioXTX View Post
    I can think of two possible answers:

    1) It really isn't working because bot* is AWStat's way of saying "I don't know what this bot is, but it's UA string starts with 'bot'", i.e. the * is a wildcard

    2) It it actually is working, but AWStats is incorrectly counting 403 forbidden as a correct hit.

    Take a look at the RAW output of the http access log to figure out which of the two above is the case.
    Rémon - Hosting Advisor

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy

  12. #87
    SitePoint Evangelist
    Join Date
    Jan 2010
    Posts
    512
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Dave,
    Your behavior has cost you your credibility with me. I won't have a response to anything you say.

    Chris

  13. #88
    Do. Or do not. There is no try silver trophy
    SitePoint Award Recipient ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    8,347
    Mentioned
    87 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by Chris77 View Post
    Dave,
    Your behavior has cost you your credibility with me. I won't have a response to anything you say.

    Chris
    Huh!?
    Rémon - Hosting Advisor

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy

  14. #89
    Certified Ethical Hacker silver trophybronze trophy dklynn's Avatar
    Join Date
    Feb 2002
    Location
    Auckland
    Posts
    14,315
    Mentioned
    15 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by ScallioXTX View Post
    Huh!?
    Ditto.

    Regards,

    DK
    David K. Lynn - Data Koncepts is a long-time WebHostingBuzz (US/UK)
    Client and (unpaid) WHB Ambassador
    Updated mod_rewrite Tutorial Article (setup, config, test & write
    mod_rewrite regex w/sample code) and Code Generator

  15. #90
    SitePoint Evangelist
    Join Date
    Jan 2010
    Posts
    512
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Scallio,
    Here's the official and informative word on bot* from both my webhost and Sourceforge (the author of awstats).

    ---------

    From Bluehost:
    "bot*" is a default naming convention used in awstats. Unknown bots are generally lumped together under bot* or *bot. Since awstats does not have the Bing search engine bot recognized, it's identifying it with the "bot*" syntax.

    Our Awstats program is currently the latest version available:

    AWStats 6.9 released (Sun, 28 Dec 2008 15:17:19 GMT)

    Notice how this release was done 2 years ago? Bing wasn't around then. Unfortunately, the awstats company has not produced another up-to-date version of the program for 2 years


    From Sourceforge:
    "bot*" means any user agent that contains "bot" and that is not in the AWStats robots list. This is a way to pick up rare or new robots. Some of them are nice, others are rogue.

    -------

    So bot* total visits is the sum of bingbot and other bots (good and bad if bad bots are visiting), therefore bot* is not the name of a bot but a variable and won't be in the raw access log. Whenever awstsats sees a bot that's not in the "awstats robots list" it adds 1 to the variable bot* (or *bot).

    Chris

  16. #91
    Do. Or do not. There is no try silver trophy
    SitePoint Award Recipient ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    8,347
    Mentioned
    87 Post(s)
    Tagged
    2 Thread(s)
    Ah, so I was right! Ha! :dance:

    Anyway, sounds like it's time for you to search for this "awstats robots list" they're talking about and update it. Or search for it on the web; maybe someone else already created an updated list.
    Rémon - Hosting Advisor

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy

  17. #92
    SitePoint Evangelist
    Join Date
    Jan 2010
    Posts
    512
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Scallio,
    Yeah you are,
    "1) It really isn't working because bot* is AWStat's way of saying "I don't know what this bot is..".

    I'll look for that list. I asked Sourceforge another question. I'll let you know what they say.

    Chris

  18. #93
    SitePoint Member
    Join Date
    May 2008
    Posts
    14
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    AWStats isn't really the best software to identify bots. Also, a lot of bad bots "pretend" to be Google, Yahoo, Bing, etc., and cannot always be easily identified.
    Block Proxies, Bots, Scrapers, and more with BlockScript.

  19. #94
    SitePoint Evangelist
    Join Date
    Jan 2010
    Posts
    512
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    IProx,
    What stats programs do you like?

    Chris

  20. #95
    SitePoint Member
    Join Date
    May 2008
    Posts
    14
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Chris77 View Post
    IProx,
    What stats programs do you like?

    Chris
    Cacti is a great tool to monitor bandwidth and throughput. Things like Google Analytics are good for some sites but isn't a great tool for identifying bots either. In the past, the only way I was able to identify bots was by looking directly at my server logs but that still involved a lot of research. I use BlockScript on high traffic sites to weed away bots, hosting providers, and proxy servers now.
    Block Proxies, Bots, Scrapers, and more with BlockScript.

  21. #96
    SitePoint Wizard
    Join Date
    Jun 2005
    Posts
    1,399
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Question

    Quote Originally Posted by Chris77 View Post
    IProx,
    What stats programs do you like?

    Chris
    Chris, how did you get along with this in the end ?

    Dez

  22. #97
    SitePoint Wizard
    Join Date
    Jun 2005
    Posts
    1,399
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Question

    Rather than trying to work out who to block, maybe it's best just to allow certain bots and disallow any others that haven't been specifically allowed? Which bots would generally be considered as legit please? Google, MSN, Yahoo, any others?

  23. #98
    SitePoint Evangelist
    Join Date
    Jan 2010
    Posts
    512
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Dez,
    This thread is being continued at
    http://www.sitepoint.com/forums/show...=1#post4778782

    Chris

  24. #99
    SitePoint Wizard
    Join Date
    Jun 2005
    Posts
    1,399
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Thumbs up

    Thanks Chris - be over there in a moment

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •