SitePoint Sponsor

User Tag List

Results 1 to 8 of 8
  1. #1
    SitePoint Member
    Join Date
    Sep 2010
    Posts
    2
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Question rogue bots , indentifying

    I have heard that it is common that malicious bots can be identified by the fact they have usually will not contain an HTTP_ACCEPT_LANGUAGE header.

    Anyone now how true this is?

    And if true, how reliable is it? I want to die() when there is no HTTP_ACCEPT_LANGUAGE header in a effort to kill malicious bots.

    Will I lose something if I do, good bots like google, yahoo? Will I lose real people?

  2. #2
    SitePoint Evangelist elgumbo's Avatar
    Join Date
    Nov 2002
    Location
    North West, UK
    Posts
    545
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I don't know about the HTTP_ACCEPT_LANGUAGE theory but I used to use the robots.txt method to find them.

    Add a disallow line in the robots.txt to a file you do not use on your site. Grab the details of any bot that access that page.

  3. #3
    SitePoint Member
    Join Date
    Sep 2010
    Posts
    2
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Actually, I am dealing with form submission software, where it scans the internet for forms, stores the form vars, and then submits crap. In my case, every 3 minutes. So since this is form submission software I don't think robots.txt can do anything for me. The bot has already come by.

    It uses random proxies so I can't block by IP. It has a user agent, but likely faked and made to appear common.

    So that is why I am investigating the accept language thing I have heard about.

  4. #4
    secure webapps for all Aleksejs's Avatar
    Join Date
    Apr 2008
    Location
    Riga, Latvia
    Posts
    755
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Do you have any kind of CSRF protection in place?
    Also, some bots are smarter some are dumber. If you use JavaScript on your page anyway, you can also make sure that some hidden field gets value depending on digest+entered data - something like:
    md5(CSRFdigest+formfield1+formffield2);
    And check on server side if correct values are present both in CSRFdigest field and in Javascript computed field.

  5. #5
    secure webapps for all Aleksejs's Avatar
    Join Date
    Apr 2008
    Location
    Riga, Latvia
    Posts
    755
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

  6. #6
    SitePoint Member
    Join Date
    May 2008
    Posts
    14
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    It is surprising how may bots are out there. Some bots pretend to be a valid search engine like Google and sneak into your website without you realizing it. Not all bots obey robots.txt either.
    Block Proxies, Bots, Scrapers, and more with BlockScript.

  7. #7
    Programming Team silver trophybronze trophy
    Mittineague's Avatar
    Join Date
    Jul 2005
    Location
    West Springfield, Massachusetts
    Posts
    17,152
    Mentioned
    190 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by IProx View Post
    It is surprising how may bots are out there. Some bots pretend to be a valid search engine like Google and sneak into your website without you realizing it. Not all bots obey robots.txt either.
    Not only that, but as elgumbo mentioned some use it to find out where you don't want them to go and then go there. Set up a "honey pot" and you'll catch some.

    Don't think of the robots.txt file as a security measure by any means.

  8. #8
    Community Advisor silver trophy

    Join Date
    Nov 2006
    Location
    UK
    Posts
    2,551
    Mentioned
    40 Post(s)
    Tagged
    1 Thread(s)
    If you're worried about bots it's worth looking at the code used by the well known wordpress plugin 'bad behaviour' (which can also be used out with the plugin)
    http://www.bad-behavior.ioerror.us/


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •