Hi all,
How do I protect my website from beeing grabbed/downloaded? I think Mozilla support this function.
/Kenneth
| SitePoint Sponsor |





Hi all,
How do I protect my website from beeing grabbed/downloaded? I think Mozilla support this function.
/Kenneth
...
Not sure quite what you mean. When people view your website they are grabbing/downloading it!Originally Posted by wide
There is nothing to stop someone taking a copy of your site, if that is what you mean.
Obscure Javascript and Flash can make it more difficult to spider all the required Urls, but other than that, you cannot stop it without preventing users in general from accessing your site.





no no, that is not what I mean. You can use programs to download/grab a COMPLETE website (all pages including images etc). I know there is a way to prevent it (by using .htaccess) but I dont remember where I read it![]()
...


No, not really. For someone to view the site, one essentially needs to be able to download the entire thing. One could ban any website grabbing software which declared itself in the browser type header. Then again, most of it allows one to masquerade as interenet explorer. If it is so important that it cannot be shared, it probably should not be on the public internet, no?
WWB





If it is so important that it cannot be shared, it probably should not be on the public internet, no?"
That is not the issue ... I have +10.000 pictures posted at atleast as many pages. If someone grab the entire site it will cost alot of bandwidth... and I dont want some moron to copy my presious content so easy.
I know some people here at SPF use htaccess to prevent this, I just cant find the threads.
...
First of all you probably want to set up robots.txt so that images are out of bounds, that will prevent google indexing your pictures, for instance.Originally Posted by wide
That may be enough.
Then you could use server-side software to limit the amount of bandwidth you will serve to to a single IP address. Of course a devious grabber could use multiple addresses, but that's unlikely.
Of course it would also stop legitimate users from browsing the whole site.





"First of all you probably want to set up robots.txt so that images are out of bounds, that will prevent google indexing your pictures, for instance."Originally Posted by geebee2
- Actualy I want Google to index my pictures, I did optimized for that :P
"Then you could use server-side software to limit the amount of bandwidth you will serve to to a single IP address. Of course a devious grabber could use multiple addresses, but that's unlikely."
- That would be a possibility, but then I will get a problem with SE bots.
I thanks for your suggestions, but I know there is another way (something about listing 50-100 lines of code into .htaccess). I will keep searching.
...
Hi Kenneth,
Putting exclusions in your robots.txt is a good step, although I suspect that site ripping software wouldn't pay much attention to that.
What you are looking for sounds like a Bot trap.
Basically, place a link on your pages (one that is hidden from legitimate users). Site ripping software will follow every link that it finds to download the content. If someone/something follows your hidden link then they're probably up to no good (and you can opt to block them).
A useful site I found a while back was: How to build a bot trap
Does this jog any memories?
Regards,
Mark





Great solution mrobinson.
I will add it later this week![]()
...
Bookmarks