SitePoint Sponsor

User Tag List

Results 1 to 9 of 9
  1. #1
    SitePoint Wizard wide's Avatar
    Join Date
    Apr 2004
    Location
    Denmark
    Posts
    1,215
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    How to protect against "website grabbing/download"?

    Hi all,

    How do I protect my website from beeing grabbed/downloaded? I think Mozilla support this function.



    /Kenneth
    ...

  2. #2
    SitePoint Enthusiast geebee2's Avatar
    Join Date
    Mar 2004
    Location
    Gloucester UK
    Posts
    57
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by wide
    Hi all,

    How do I protect my website from beeing grabbed/downloaded? I think Mozilla support this function.



    /Kenneth
    Not sure quite what you mean. When people view your website they are grabbing/downloading it!

    There is nothing to stop someone taking a copy of your site, if that is what you mean.

    Obscure Javascript and Flash can make it more difficult to spider all the required Urls, but other than that, you cannot stop it without preventing users in general from accessing your site.
    George

    http://qaaz.com

    For low cost database-driven web-sites

  3. #3
    SitePoint Wizard wide's Avatar
    Join Date
    Apr 2004
    Location
    Denmark
    Posts
    1,215
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    no no, that is not what I mean. You can use programs to download/grab a COMPLETE website (all pages including images etc). I know there is a way to prevent it (by using .htaccess) but I dont remember where I read it
    ...

  4. #4
    SitePoint Author silver trophybronze trophy
    wwb_99's Avatar
    Join Date
    May 2003
    Location
    Washington, DC
    Posts
    10,629
    Mentioned
    4 Post(s)
    Tagged
    0 Thread(s)
    No, not really. For someone to view the site, one essentially needs to be able to download the entire thing. One could ban any website grabbing software which declared itself in the browser type header. Then again, most of it allows one to masquerade as interenet explorer. If it is so important that it cannot be shared, it probably should not be on the public internet, no?

    WWB

  5. #5
    SitePoint Wizard wide's Avatar
    Join Date
    Apr 2004
    Location
    Denmark
    Posts
    1,215
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    If it is so important that it cannot be shared, it probably should not be on the public internet, no?"

    That is not the issue ... I have +10.000 pictures posted at atleast as many pages. If someone grab the entire site it will cost alot of bandwidth... and I dont want some moron to copy my presious content so easy.

    I know some people here at SPF use htaccess to prevent this, I just cant find the threads.
    ...

  6. #6
    SitePoint Enthusiast geebee2's Avatar
    Join Date
    Mar 2004
    Location
    Gloucester UK
    Posts
    57
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by wide
    If it is so important that it cannot be shared, it probably should not be on the public internet, no?"

    That is not the issue ... I have +10.000 pictures posted at atleast as many pages. If someone grab the entire site it will cost alot of bandwidth... and I dont want some moron to copy my presious content so easy.

    I know some people here at SPF use htaccess to prevent this, I just cant find the threads.
    First of all you probably want to set up robots.txt so that images are out of bounds, that will prevent google indexing your pictures, for instance.

    That may be enough.

    Then you could use server-side software to limit the amount of bandwidth you will serve to to a single IP address. Of course a devious grabber could use multiple addresses, but that's unlikely.

    Of course it would also stop legitimate users from browsing the whole site.
    George

    http://qaaz.com

    For low cost database-driven web-sites

  7. #7
    SitePoint Wizard wide's Avatar
    Join Date
    Apr 2004
    Location
    Denmark
    Posts
    1,215
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by geebee2
    First of all you probably want to set up robots.txt so that images are out of bounds, that will prevent google indexing your pictures, for instance.

    That may be enough.

    Then you could use server-side software to limit the amount of bandwidth you will serve to to a single IP address. Of course a devious grabber could use multiple addresses, but that's unlikely.

    Of course it would also stop legitimate users from browsing the whole site.
    "First of all you probably want to set up robots.txt so that images are out of bounds, that will prevent google indexing your pictures, for instance."

    - Actualy I want Google to index my pictures, I did optimized for that :P

    "Then you could use server-side software to limit the amount of bandwidth you will serve to to a single IP address. Of course a devious grabber could use multiple addresses, but that's unlikely."

    - That would be a possibility, but then I will get a problem with SE bots.

    I thanks for your suggestions, but I know there is another way (something about listing 50-100 lines of code into .htaccess). I will keep searching.
    ...

  8. #8
    SitePoint Enthusiast mrobinson's Avatar
    Join Date
    Aug 2004
    Location
    New York, NY, USA
    Posts
    50
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi Kenneth,

    Putting exclusions in your robots.txt is a good step, although I suspect that site ripping software wouldn't pay much attention to that.

    What you are looking for sounds like a Bot trap.

    Basically, place a link on your pages (one that is hidden from legitimate users). Site ripping software will follow every link that it finds to download the content. If someone/something follows your hidden link then they're probably up to no good (and you can opt to block them).

    A useful site I found a while back was: How to build a bot trap

    Does this jog any memories?

    Regards,
    Mark

  9. #9
    SitePoint Wizard wide's Avatar
    Join Date
    Apr 2004
    Location
    Denmark
    Posts
    1,215
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Great solution mrobinson.

    I will add it later this week
    ...


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •