SitePoint Sponsor

User Tag List

Results 1 to 23 of 23
  1. #1
    Ex-SitePointer silver trophy
    iFroggy's Avatar
    Join Date
    Oct 2000
    Location
    Harbinger, NC, U.S.A.
    Posts
    4,126
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)

    Harvesting Content from My Site (Legal Side)

    Hello,

    From my thread on the technical aspects:

    I run the first site for phpBB hacks and the largest source of phpBB related downloads on the internet. There is a site out there that is using a script to harvest data directly from our database. Author name, version number, the .zip file, our description, the title, everything. Just taking it. No ethics, no morals, forget it. He's a scumbag. He's banned from my community.

    Now, much of the content, I can't lay a claim to legally. The only possible thing I can is descriptions. In a lot of cases, descriptions are similar what authors provide with some help from myself. A capital letter here, a period here, etc. But, a lot of the time they are hybrid of what the author submits and what I write. In those cases, a lot of the time they are from the .zip file, so they are similarly GPL. But some of the time, they are not and some of the time I write the whole thing. In those cases, those descriptions are copyrighted to me. His host, however, has acted very unprofessionally - sharing my entire e-mails with him and telling him he's got "nothing to worry about."

    ... We added a new download last night and he ran his script today and now has it. I have worked for years to build my database on quality and on author permission and he is completely violating that work and everything that we stand for.
    I'm looking for more of the legal side of things here. Any input would be great. Basically, the content isn't copyright to us except for the descriptions that I wrote which are on my site only. But there are not loads of those. So, that is kind of weak legally, even though the law is still on my side. He's accessing my server, taking my bandwidth, etc.

    I'm a little tired right now as its late so I'm sure I'm leaving stuff out. Any questions, please ask.

    Thanks.

  2. #2
    Life is short. Be happy today! silver trophybronze trophy Sagewing's Avatar
    Join Date
    Apr 2003
    Location
    Denver, Phang-Nga, Thailand
    Posts
    4,379
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Here's the question my attorney would ask, I bet:

    "Are there any MATERIAL damages?"

    Where does the revenue from the site come from? I have used phpbbhacks many time (great site, btw) and I think it's reasonable to assume that if a site intereferes with the uniqueness/reputation of your site by stealing content (even if its just the non-gpl'd description, etc.) that would interfere with future revenue in a real way.

    If you tried to sue him for damages, it would probably be a nightmare. However, making a valid suit against someone with the intent of stopping them from doing something is much easier.

    So, any material damages??
    The fewer our wants, the nearer we resemble the gods. Socrates

    SAGEWING LLC - QUALITY WEB AND MOBILE APPS. PREMIUM OUTSOURCING SERVICES.
    Twitter | LinkedIn | Facebook | Google+

  3. #3
    Ex-SitePointer silver trophy
    iFroggy's Avatar
    Join Date
    Oct 2000
    Location
    Harbinger, NC, U.S.A.
    Posts
    4,126
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Thanks for the reply.

    In the way you say it, I guess there could be. I mean, we're all about quality, etc. (Thanks for the kind words). I spend time ensuring that what authors submit to me for a description is edited for consistency (consistent spelling, grammar, etc.). But, I didn't write it. I'm just an editor, you could say. And a lot of the time I beef it up. And then some of the time there is no description or a weak one and I completely write it. Am I losing money from it? Eh, I don't know.

    I don't really think it's worth suits, lawyers, etc., though. It bugs me as it's our work.

    I mean, we've spent more than 4 years here. We've built our database slowly. 1 by 1. We have author permission for ALL of our downloads. We don't have to get it, but we want it because we believe it's right. If we don't have author permission for it to be in our database, we do not want it in our database. And this person is simply using a script to take advantage of the hard work of 4+ years and really just claim it as their own. It's despicable. At the same time, legally, we can't really claim that much that I can see. Even if they are using us like this.

  4. #4
    SitePoint Wizard
    Join Date
    Jul 2004
    Location
    Minneapolis, MN
    Posts
    1,924
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    If you haven't already contacted the guy I'd start with that. Legally I don't think that it would be worth it.

    Here is what I would do, work against his little script. I think the most valuable part of your website is going to be the .zip files, and you can easily setup something that would make it hard to achieve these unless you do it manually. Place all the zip files (in some nice folder structure) in a file outside of the web root. Then when someone downloads it have the script temporarily write it into a directory with a random name. Then delete it after it has downloaded.

    I've seen similar systems work wonders in keeping files only to paying customers, and I'm sure it could work in your case at keeping only (say) registered members from downloading your files.

  5. #5
    Ex-SitePointer silver trophy
    iFroggy's Avatar
    Join Date
    Oct 2000
    Location
    Harbinger, NC, U.S.A.
    Posts
    4,126
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Thanks for that.

    Let's just say that contacting the guy is not an option. He knows it's wrong, he knows I've been talking to his host, his ISP, etc. He's an idiot. You can't talk to people like that.

    We don't really have a registration system in place for downloads, so I'm not sure that that would work. But, I'll keep that in mind/maybe look into it. Thanks.

  6. #6
    SitePoint Wizard Crowe's Avatar
    Join Date
    Nov 2001
    Location
    Huntsville
    Posts
    1,117
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    You should also figure out how his script identifies itself, ip etc. and use htaccess to ban the thing entirely. Or, get devious, and if you detect that it's his script serve up empty zip files and IPSUM LOREM descriptions for the descriptions
    Chrispian H. Burks
    Nothing To Say

  7. #7
    Ex-SitePointer silver trophy
    iFroggy's Avatar
    Join Date
    Oct 2000
    Location
    Harbinger, NC, U.S.A.
    Posts
    4,126
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    I found an IP - I think the IP. But, I also think he'll just change it. That devious thing would be cool if I knew how.

  8. #8
    SitePoint Wizard Crowe's Avatar
    Join Date
    Nov 2001
    Location
    Huntsville
    Posts
    1,117
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by iFroggy
    I found an IP - I think the IP. But, I also think he'll just change it. That devious thing would be cool if I knew how.
    Use the htaccess to deny his IP address from your site. Also, his script should send some sort of header identifying itself. Deny that too. Make him change it so much that he can't scrape your site without manually doing it.

    You've got me email - send me everything you know about how this script access the site.. a couple lines from your logs with his access in it. Let me see what we can come up with!
    Chrispian H. Burks
    Nothing To Say

  9. #9
    l 0 l silver trophybronze trophy lo0ol's Avatar
    Join Date
    Aug 2002
    Location
    Palo Alto
    Posts
    5,329
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by iFroggy
    I mean, we've spent more than 4 years here. We've built our database slowly. 1 by 1. We have author permission for ALL of our downloads. We don't have to get it, but we want it because we believe it's right. If we don't have author permission for it to be in our database, we do not want it in our database. And this person is simply using a script to take advantage of the hard work of 4+ years and really just claim it as their own. It's despicable. At the same time, legally, we can't really claim that much that I can see. Even if they are using us like this.
    I know exactly how you feel (except in my case with my site it's three and a half years or so). It's a tricky position. Morally it's obviously scum... but legally it might be a different thing. I'm guessing something like this would hold up in court, but it's hard to say.

  10. #10
    Ex-SitePointer silver trophy
    iFroggy's Avatar
    Join Date
    Oct 2000
    Location
    Harbinger, NC, U.S.A.
    Posts
    4,126
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Thanks Chris. I'm actually not sure about this IP now actually. But, if I have something, I'll send it. Appreciate it.

  11. #11
    SitePoint Enthusiast Mall23's Avatar
    Join Date
    Oct 2005
    Posts
    26
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    There was a similar case with a company (can't remember the name) vs/ eBay. The company was spydering eBay's content and plugging it into their site. eBay won -- but only because they were able to show that those eBay customers had essentially purchases a plot of "land" on eBay, and the other company was manipulating (I don't know of a better term) that customers purchased plot w/o their (the customers) concent.

    So if you have a site w/ content where customers do not have to pay, it may not stand up in court. The customer freely added content publically available that is not copyright.

    It is LAME and SCUMMY -- I totally agree.

    If this were to happen to me, I would block IP's as mentioned before and if you can program, change the code that is displayed on the site. Reason being is that this scummy person is using a program to gather the data, parse it, and dump it into a database. That is VERY easy to do on a cookie-cutter site. For example, to steal all the messages on sitepoint, a developer would just have to know how to parse a single message page to get ALL the data. Simple.

    So if you can program, add crap random code into your pages and/or create routines to slightly modify the order of HTML tags in the code and/or slightly modify the display of the pages -- in a random method. For example, you might have a bold link with the BOLD statement first and the A HREF second. HTML allows you to swap those codes, and/or replace the BOLD with a STYLE call in the A HREF. Or, you can add random garbage in lesser-than and greater-than characters. (Obfuscation won't work for this because even obfuscated code is standard throughout a site.)

    If the theif ignores all HTML tags, they'll use headers and titles as markers. So in the sitepoint example, there is "Join Date:" on the thread list. A developer would just look for that. It might get a bit annoying, but you could randomally change that as well: "Join Date:", "Join-Date", "--Join Date--", etc...this will also help.

    These things should help make your site unreadable by a parsing program -- or REALLY tough to do (nothing is a perfect solution) -- making it not worth the programmers time to steal content from your site.

    Hope this helps!!

  12. #12
    Ex-SitePointer silver trophy
    iFroggy's Avatar
    Join Date
    Oct 2000
    Location
    Harbinger, NC, U.S.A.
    Posts
    4,126
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Thanks for the replies.

    I wouldn't know how to randomize code, etc. At the same time, I'm not interested in doing things that take away from the quality of the site, either. Like changing headers as you mentioned.

    I have blocked the IPs in question. But doing a wild card isn't feasible because it is big ISP.

  13. #13
    Life is short. Be happy today! silver trophybronze trophy Sagewing's Avatar
    Join Date
    Apr 2003
    Location
    Denver, Phang-Nga, Thailand
    Posts
    4,379
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Another thought: One of the things that makes phpbbhacks.com great is that is is an established and loyal community. Why not release information about this problem to your userbase and make sure that your everyone in the phpbb commmunity knows that the scumbag is stealing your content.

    I'm not saying you should bash him in an announcement (no pissing contests), but some kind of an announcement might be helpful. Your users are loyal, and they should know that the 'other site' is not legit.

    I know that for me personally (and I am a user of phpbbhacks for a while now) if I knew the URL to the scumbag's site, I'd keep in mind never to support it beyond a single curiosity visit. I think a lot of users would be loyal, especially since you are supporting an open source type of crowd.

    You worked hard to develop the reuptation, use it!

    Disclaimer: I'm not saying this is the way to go, it was 'just a thought' !

    You have such a huge advantage as an established player in the field, you should leverage that to keep the other site from gaining more success than they deserve.
    The fewer our wants, the nearer we resemble the gods. Socrates

    SAGEWING LLC - QUALITY WEB AND MOBILE APPS. PREMIUM OUTSOURCING SERVICES.
    Twitter | LinkedIn | Facebook | Google+

  14. #14
    King of Paralysis by Analysis bronze trophy
    Join Date
    Jul 2004
    Location
    Ottawa, Canada
    Posts
    5,840
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Why don't you just implement one of those CAPTA(?) items where you have to enter the text in the picture.

    That is a small price to pay to get valuable free content and stops spiders.

  15. #15
    Life is short. Be happy today! silver trophybronze trophy Sagewing's Avatar
    Join Date
    Apr 2003
    Location
    Denver, Phang-Nga, Thailand
    Posts
    4,379
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by tke71709
    Why don't you just implement one of those CAPTA(?) items where you have to enter the text in the picture.

    That is a small price to pay to get valuable free content and stops spiders.
    Not a bad idea!
    The fewer our wants, the nearer we resemble the gods. Socrates

    SAGEWING LLC - QUALITY WEB AND MOBILE APPS. PREMIUM OUTSOURCING SERVICES.
    Twitter | LinkedIn | Facebook | Google+

  16. #16
    Ex-SitePointer silver trophy
    iFroggy's Avatar
    Join Date
    Oct 2000
    Location
    Harbinger, NC, U.S.A.
    Posts
    4,126
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Thanks for the replies.

    dhecker,

    We have done it on a limited basis - I've contacted a bunch of authors and informed them that he is using their downloads in this fashion.

    I am not going to make it a public issue with my users just because I don't think the benefit is there for us. We're a "big fish" (if you will) and he is a "small fish". Big fishes shouldn't draw attention to small fishes, in my opinion. Thousands of people visit us everyday. He'd be lucky if 200 visit his site. The day I mention his site publicly is the biggest traffic day he's ever had.

    We've dealt with a lot of idiots at phpBBHacks.com, but we try very hard to do so privately. We move in silence. I don't want it to affect my users. If they were to come to me and ask me, I'd give it to them straight, though.

    tke,

    We're thinking/have been thinking about that. But, for every download? It would be too much of a hastle for our users. What we were thinking was to just implement it on attempts for his ISP - we get a bunch of users from his ISP, but at least it wouldn't be all of our users, just a very small percentage. That may be worth it - but I am not even sure how we'd implement that, anyway.

  17. #17
    Life is short. Be happy today! silver trophybronze trophy Sagewing's Avatar
    Join Date
    Apr 2003
    Location
    Denver, Phang-Nga, Thailand
    Posts
    4,379
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Reading back, I'm glad I put my 'disclaimer' on that post. I think that your 'quieter' approach is probably the more dignified and effective way to go
    The fewer our wants, the nearer we resemble the gods. Socrates

    SAGEWING LLC - QUALITY WEB AND MOBILE APPS. PREMIUM OUTSOURCING SERVICES.
    Twitter | LinkedIn | Facebook | Google+

  18. #18
    King of Paralysis by Analysis bronze trophy
    Join Date
    Jul 2004
    Location
    Ottawa, Canada
    Posts
    5,840
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by iFroggy
    tke,

    We're thinking/have been thinking about that. But, for every download? It would be too much of a hastle for our users. What we were thinking was to just implement it on attempts for his ISP - we get a bunch of users from his ISP, but at least it wouldn't be all of our users, just a very small percentage. That may be worth it - but I am not even sure how we'd implement that, anyway.
    It all depends on how bright and determined that he is, he can always use random proxy servers to harvest your content if you shut out only his ISP.

    Give it a try and see what happens, otherwise you're stuck either letting him have the content (and risking duplicate penalties), filing a DMCA complaint with google to get him out of the SERPS, or implementing the captcha solution for all your users. Another thing you can do is have your lawyer contact his host with a long detailed letter outlining their requirements under the DMCA, you've stated that some of your content is your own writing therefore he can't use it and they are liable once you have contacted them.

    You might get yourself a hosting company out of this when all is said and done!

  19. #19
    Ex-SitePointer silver trophy
    iFroggy's Avatar
    Join Date
    Oct 2000
    Location
    Harbinger, NC, U.S.A.
    Posts
    4,126
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Yeah. Unfortunately, it's not worth it for me to hire an attorney, at this time. Yeah, he'd probably just use some other proxy, I imagine. Thanks for the ideas, though, will keep them in mind.

  20. #20
    SitePoint Addict
    Join Date
    Sep 2005
    Posts
    229
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    As far as I am aware, you have database copyrights. Just like someone cannot copy the phone book and resell it. Qwest may not own the numbers, but they own the set of numbers.

  21. #21
    Ex-SitePointer silver trophy
    iFroggy's Avatar
    Join Date
    Oct 2000
    Location
    Harbinger, NC, U.S.A.
    Posts
    4,126
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Thanks for that. Any sort of information online I can read up on relating to it?

  22. #22
    SitePoint Member
    Join Date
    Nov 2005
    Location
    New York
    Posts
    17
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Re:

    If he uses a script can you not block his IP on the server side? It should not be difficult to figure what server it is since he is making constant connection.

    Just a thought.

  23. #23
    Ex-SitePointer silver trophy
    iFroggy's Avatar
    Join Date
    Oct 2000
    Location
    Harbinger, NC, U.S.A.
    Posts
    4,126
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    He might be doing it manually now. i.e., taken our whole database, cleaned up the duplicates and then started doing it manually as part of a normal routine, but that's just a guess. However, his IPs change, he doesn't appear to be doing it from the server, etc. (which leads me to be guess above). He has our data now and I've tried a few things to no avail, so either he's doing it manually or... it'll be hard to block. Thanks for the reply.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •