SitePoint Sponsor

User Tag List

Results 1 to 6 of 6
  1. #1
    Ex-SitePointer silver trophy
    Patrick's Avatar
    Join Date
    Oct 2000
    Location
    Harbinger, NC, U.S.A.
    Posts
    4,126
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)

    Harvesting Content From My Site

    Hello,

    I'm dealing with a bit of an issue and would like some input.

    I run the first site for phpBB hacks and the largest source of phpBB related downloads on the internet. There is a site out there that is using a script to harvest data directly from our database. Author name, version number, the .zip file, our description, the title, everything. Just taking it. No ethics, no morals, forget it. He's a scumbag. He's banned from my community.

    Now, much of the content, I can't lay a claim to legally. The only possible thing I can is descriptions. In a lot of cases, descriptions are similar what authors provide with some help from myself. A capital letter here, a period here, etc. But, a lot of the time they are hybrid of what the author submits and what I write. In those cases, a lot of the time they are from the .zip file, so they are similarly GPL. But some of the time, they are not and some of the time I write the whole thing. In those cases, those descriptions are copyrighted to me. His host, however, has acted very unprofessionally - sharing my entire e-mails with him and telling him he's got "nothing to worry about."

    I am working on it from that side, but I wanted to ask if anyone had any advice for blocking these data mining. Is there anything I can do? It's very lame. We added a new download last night and he ran his script today and now has it. I have worked for years to build my database on quality and on author permission and he is completely violating that work and everything that we stand for.

    Thank you.

  2. #2
    ☆★☆★ silver trophy vgarcia's Avatar
    Join Date
    Jan 2002
    Location
    in transition
    Posts
    21,235
    Mentioned
    1 Post(s)
    Tagged
    1 Thread(s)
    Is there anything particularly unique about this script? Host names, user agent strings, something like that? That would be the easiest way to block.

  3. #3
    SitePoint Wizard gold trophysilver trophybronze trophy dc dalton's Avatar
    Join Date
    Nov 2004
    Location
    Right behind you, watching, always watching.
    Posts
    5,431
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Wow that seriously sucks. Man people like that need a good beating!

    Couldnt you encrypt the dbs so his spidering would return him useless data without the key that is)

    Im sure you might be able to go after him in some legal way too, Im just not 100% sure right this second.

  4. #4
    $this->toCD-R(LP); vinyl-junkie's Avatar
    Join Date
    Dec 2003
    Location
    Federal Way, Washington (USA)
    Posts
    1,524
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I don't have any experience with problems like yours, but I did a little Googling and found this. It looks to me like it's exactly what you're looking for. Hope it helps.
    Music Around The World - Collecting tips, trade
    and want lists, album reviews, & more
    Showcase your music collection on the Web

  5. #5
    Ex-SitePointer silver trophy
    Patrick's Avatar
    Join Date
    Oct 2000
    Location
    Harbinger, NC, U.S.A.
    Posts
    4,126
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Thanks for the replies.

    Vinnie,

    Not that I know of. Then again, I don't know how to find out.

    dc dalton,

    Yeah, exactly. You don't know the half of these guy. If it weren't for working online, I don't know if I would have known that people like this existed.

    How could I do that? I think he's taking the data from the HTML pages (i.e. the database driven pages). He's not accessing the DB directly.

    vinyl-junkie,

    I'll look into that. Worst designed website I've seen in a while, but I'll take a look.

    Thanks guys.

  6. #6
    SitePoint Wizard gold trophysilver trophybronze trophy dc dalton's Avatar
    Join Date
    Nov 2004
    Location
    Right behind you, watching, always watching.
    Posts
    5,431
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by iFroggy
    dc dalton,

    Yeah, exactly. You don't know the half of these guy. If it weren't for working online, I don't know if I would have known that people like this existed.

    How could I do that? I think he's taking the data from the HTML pages (i.e. the database driven pages). He's not accessing the DB directly.
    Ill have to look into that, I thought he was hitting your dbs. .... those hokey html encryptors are junk.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •