SitePoint Sponsor

User Tag List

Results 1 to 4 of 4
  1. #1
    SitePoint Addict
    Join Date
    Oct 2010
    Posts
    306
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    UpDowner Scraping Site

    Anyone had this site start scraping theirs?

    It's seems like a bit of an issue, scraping, linking to and taking content.

    I have most of my site now linked via updowner.com

    Surely google should know about this and deindex the site. I did a quick search and it comes up in the search.

    Anyone have any advice on how to block bots that I don't want coming in via robots or htaccess

  2. #2
    SitePoint Enthusiast
    Join Date
    Jan 2010
    Posts
    84
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I don't have vast knowledge of scraping but I have looked into it. If you have RSS feeds set up, then it's very easy for scrapers to automate your content onto theirs through these feeds. There are also specialist programs set up to scrape information (outside of RSS feeds). I'm not aware of any ways to block these apart from disable the RSS.

    What I have seen is that most websites that form content from scraping others, are usually very poor quality and heavily focused on advertising.

    For SEO purposes, Google only recognises the site where the original content came from, so don't worry if others are ripping content from your site. If your content is unique, then you really have nothing to worry about.

    Make sure that pages where information may be scrapped have reference back to you site. That way it's free advertising and extended PR. See scraping as a pat on the back for you - i.e. for providing great content that others want to promote.

  3. #3
    Life is not a malfunction gold trophysilver trophybronze trophy
    TechnoBear's Avatar
    Join Date
    Jun 2011
    Location
    Argyll, Scotland
    Posts
    6,177
    Mentioned
    264 Post(s)
    Tagged
    5 Thread(s)
    Quote Originally Posted by dariussutherland View Post
    Anyone have any advice on how to block bots that I don't want coming in via robots or htaccess
    robots.txt is useless for blocking unwanted bots, as bots don't have to obey it and badbots are the least likely to take any notice of it. If you know the user-agent string or IP address for the particular bot, you can block it via .htaccess.

  4. #4
    SitePoint Addict
    Join Date
    Oct 2010
    Posts
    306
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks guys.

    I think I will set up one of those bot traps.

    1 other question in, what size / line length should .htaccess file be kept to to ensure speed.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •