SitePoint Sponsor

User Tag List

Results 1 to 7 of 7
  1. #1
    SitePoint Evangelist
    Join Date
    May 2003
    Posts
    592
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Web server logs and stats show an abuse ?

    Hi,

    Being webmaster of a new site, I check the stats every few days. Of course, doing a fair bit of changes and testing still, my IP address is always up the top. I was very surpised to see an IP address (220.240.235.13 ) use up over 16 Mb in one day, 487 hits, 487 pages. The web server logs are filled with this IP address, the person is using an agent called "RPT-HTTPClient/0.3-3"

    I have contacted the ISP, and will supply all the details soon. My concern is, what was this person doing, and even if they were a 'legit' crawler' or 'spider', then why 16 Mb ? The whole site is less than 3 Mb.

    There are 635 "GET" requests in 24 mins, and that is how it totalled up to over 16 Mb. Can I tell by the web server logs if they are attempting to do anything that they shouldn't (apart from using up WAY too much bandwidth for a spider session) ??

    Having a quick look through, the messages are either "200" or "302", but does an entry like this:

    Code:
     220.240.235.13 - - [07/Mar/2004:01:48:48 -0500] "GET /product_info.php?products_id=50&language=es HTTP/1.1" 200 31777 "-" "RPT-HTTPClient/0.3-3"
    indicate that the user agent is attempting, or has been able to 'read' the PHP file ? The file in the above example is only 12,873 bytes, but no doubt any images,etc would add up. I can't understand why they have spidered/crawled through, and changed the language settings, and done everythign possible, like even changing the sort sequence on some pages. It seems a VERY deep crawl to me ??

    If it is a single person (probably because the name is "dsl-13.235.240.220.dsl.comindico.com.au" ) , and not a connection shared by many, then I'll simply ban the IP.

    Is there any legal action that can be taken against this person/s ??

    Thanks,

    Peter

  2. #2
    Idler. Chazzy's Avatar
    Join Date
    Jan 2004
    Location
    Uk
    Posts
    336
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Not sure, whats your site about?

  3. #3
    SitePoint Zealot
    Join Date
    Feb 2004
    Location
    Durham, NE England, UK
    Posts
    127
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    erm i hate to say this, but that looks remarkably like this person is trying to 'rip' your entire site, by that i mean hes, using a special program(like http://www.httrack.com) to download everthing on the site, images, pages the lot. and theirs nothing i kno of that you can do to stop it

    EDIT- i have just checked this and i was wrong, ive found the information below for you tho, it appears to be a search engine spider

    USER COMMENTS FOR RPT-HTTPClient/0.3-3
    videokef 2003-03-27 08:45
    Search Engine Spider
    Here is what it is:

    Search Engine Spider Identification. RPT-HTTPClient/0.3-3 Nameprotect.com

    it also used by NYC Road Runner

    --------------------------------------------------------------------------------

    Type: Search Engine
    Behaviour: Naughty
    Exclusion Protocol: No
    Supports NoIndex: No
    their ya go
    www.dmpdesign.net - cheap web design for everyone
    Visit one of my websites - Click Here
    contact me on msn messenger - steve1030_uk@hotmail.com
    or yahoo messenger - djsteve1030

  4. #4
    SitePoint Evangelist
    Join Date
    May 2003
    Posts
    592
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi,

    Quote Originally Posted by DJ_Steve
    erm i hate to say this, but that looks remarkably like this person is trying to 'rip' your entire site, by that i mean hes, using a special program(like http://www.httrack.com) to download everthing on the site, images, pages the lot. and theirs nothing i kno of that you can do to stop it
    Hmm, I _think_ I saw some PHP code sometime back that checked on 'user agents', and disallowed access to the website, not too sure. For now, the IP address is banned; it seems to be an ADSL connection, so I assume it is one person/one connection.

    Quote Originally Posted by DJ_Steve
    EDIT- i have just checked this and i was wrong, ive found the information below for you tho, it appears to be a search engine spider:

    USER COMMENTS FOR RPT-HTTPClient/0.3-3
    videokef 2003-03-27 08:45
    Search Engine Spider
    Here is what it is:

    Search Engine Spider Identification. RPT-HTTPClient/0.3-3 Nameprotect.com

    it also used by NYC Road Runner

    --------------------------------------------------------------------------------

    Type: Search Engine
    Behaviour: Naughty
    Exclusion Protocol: No
    Supports NoIndex: No
    I wonder what "Behaviour: Naughty" means ? Also, it doesn't support 'noindex'. If it is a spider, why the 16Mb ? There were lots of entries in the logs for this IP's spider session trying to access the login script, and other files that were a deep level spider, like every conceivable page, and every option on each form. Not a normail spider of the site, as I have done test spiders, and the resultant list would not amount to more than 100K, if that.

    Thanks for your help,

    Peter

  5. #5
    SitePoint Zealot
    Join Date
    Feb 2004
    Location
    Durham, NE England, UK
    Posts
    127
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by jehoshua
    Hi,


    Hmm, I _think_ I saw some PHP code sometime back that checked on 'user agents', and disallowed access to the website, not too sure. For now, the IP address is banned; it seems to be an ADSL connection, so I assume it is one person/one connection.


    I wonder what "Behaviour: Naughty" means ? Also, it doesn't support 'noindex'. If it is a spider, why the 16Mb ? There were lots of entries in the logs for this IP's spider session trying to access the login script, and other files that were a deep level spider, like every conceivable page, and every option on each form. Not a normail spider of the site, as I have done test spiders, and the resultant list would not amount to more than 100K, if that.

    Thanks for your help,

    Peter
    spiders are very smart, take a look at the attached picture to see what i mean (taken of a differnet website)
    Attached Images Attached Images
    www.dmpdesign.net - cheap web design for everyone
    Visit one of my websites - Click Here
    contact me on msn messenger - steve1030_uk@hotmail.com
    or yahoo messenger - djsteve1030

  6. #6
    SitePoint Evangelist
    Join Date
    May 2003
    Posts
    592
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi,

    Quote Originally Posted by DJ_Steve
    spiders are very smart, take a look at the attached picture to see what i mean (taken of a differnet website)
    Hmm, pretty tricky, but maybe the software/script that displays what users are doing is wrong.

    Found one interesting thread about banning user agents, especially ones that grab your entire website.

    http://www.sitepoint.com/forums/showthread.php?t=107942

    Peter

  7. #7
    SitePoint Evangelist
    Join Date
    May 2003
    Posts
    592
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi "DJ_Steve",

    Quote Originally Posted by DJ_Steve
    i have just checked this and i was wrong, ive found the information below for you tho, it appears to be a search engine spider
    Can you please advise where you got this information from. I have found some references to spider names, but if there is an exhaustive list somewhere, it would help me then know who is spidering and who is trying to grab the whole site.

    Thanks,

    Peter


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •