SitePoint Sponsor

User Tag List

Page 1 of 2 12 LastLast
Results 1 to 25 of 37

Thread: What is duplicate content?

  1. #1
    Serial Publisher silver trophy aspen's Avatar
    Join Date
    Aug 1999
    Location
    East Lansing, MI USA
    Posts
    12,936
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    What is duplicate content?

    Lately it seems there has been an increase in datafeed driven/affiliate content sites out there. I myself have made quite a few. I have also seen the issue of what exactly is duplicate content discussed a few times recently.

    We all know Google says that duplicate content is a "don't" and as such you risk being banned or penalized for doing it. But what exactly is duplicate content? It isn't just affiliate datafeed sites, such as those using Amazon AWS, that have duplicate content. People often create sites using feeds from Wikipedia and DMOZ, is this duplicate content? You could find a press release from Tivo on thousands of news, financial, or electronics websites. Is that duplicate content? What about game cheat sites that all list the same cheats?

    I think we can all agree that when a single individual or business owns two websites with the exact same content that it is spam. But what about the thousands of websites owned by different people that all use the same content? Amazon AWS (Amazon Web Services) sites are not unique, they only offer affiliate content, and thus it'd seem Google would like to get rid of them in favor of listings for Amazon.com. In this situation it is easy to figure out who should get listed because there is a parent company everyone is affiliates with.

    What about game cheat sites though? If you wanted to get rid of all the duplicate content how do you decide which one stays? DMOZ editors have faced this issue for a long time. You have two sites with the same content, which one is listed? My solution when I was an editor was to list them both, the reason is that maybe one site might be down when a user tries to visit it, so a certain amount of redundancy makes the directory more useful.

    New datafeed enabled affiliate programs show up every day, as do new datafeed driven websites. Eventually there will be too many, search engines will have to do something, but what? There will be too many for manual review, and any automatic system could hurt other sites with duplicate content such as news sites and game cheat sites, etc. You might be able to write an algorithm that detects most Amazon AWS sites, but what about the thousands of other affiliate programs out there? And even then you're still just getting most of the websites. People will find a way around any filters.
    Last edited by aspen; Sep 18, 2004 at 07:53.
    Chris Beasley - I publish content and ecommerce sites.
    Featured Article: Free Comprehensive SEO Guide
    My Guide to Building a Successful Website
    My Blog|My Webmaster Forums

  2. #2
    runat="server" Golgotha's Avatar
    Join Date
    Nov 2001
    Location
    Colorado
    Posts
    2,085
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    yeah, it's quite the quadary they have. With RSS feeds and blogs spreading like wildfire you can see why they feel they have to do something- but what is a good question.

    TemplateMonster will certainly take a hit...

    "People will find a way around any filters." Yep, you hit the nail of the head there...

  3. #3
    runat="server" Golgotha's Avatar
    Join Date
    Nov 2001
    Location
    Colorado
    Posts
    2,085
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    It's not like television sees this as a problem...You get home flip on CNN and get the latest news scoop...Then later your on MSNBC only to get the same scoop...followed up by your local news giving the same scoop...

    Maybe, Google should start their own TV channel?
    Hey, if Mark Cuban can, why not Google?

  4. #4
    SitePoint Guru SimonMc's Avatar
    Join Date
    Apr 2002
    Location
    The Office
    Posts
    616
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Golgotha
    It's not like television sees this as a problem...You get home flip on CNN and get the latest news scoop...Then later your on MSNBC only to get the same scoop...followed up by your local news giving the same scoop...

    Maybe, Google should start their own TV channel?
    Hey, if Mark Cuban can, why not Google?
    I think that google has appointed itself the thought control of the internet. This is spam..that is spam..etc etc. This is right...this is wrong...etc. Many of these practices that google bans were around before the major search engines. It is googles job as they say to provide relevant search results. If I do a search for a book...then I would expect google to return the most relevant result for that. However...If I am interested in the subject of the book as well..I would not mind google pulling up an afiliate site about the subject. How can google tell what I am thinking?

    I think that major affiliate programs when integrated as part of a larger topic add value. When an affiliate site is just an affiliate site it adds nothing.

    If google can look at the bigger picture then I think that well integrated affiliate sites will be OK. It is the ones that rely entirely on AWS or feeds and have nothing of their own to offer will suffer.

    Just my opinion.

    Simon

  5. #5
    What a twist! Kings's Avatar
    Join Date
    Jul 2002
    Location
    The Netherlands
    Posts
    954
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I run a lyrics website, with approx. 30,000 lyrics at the moment. I'm fairly sure that most of my lyrics are also available on other websites. So, does that mean I've got duplicate content or not? It's nothing from a datafeed, or affiliate, but nonetheless still 'duplicate'.

    I have another website with a lot of Wikipedia articles. Duplicate content or not? Other articles on the website are unique. So how should Google handle this? Drop the whole website, or only drop the duplicate content?

    This isn't going to be easy for Google if they're seriously looking at cracking down on duplicate content. Who decides what duplicate content is, and better yet, why should certain types of duplicate content not be indexed?
    Dennis Pallett - NoCertainty - My Personal Weblog
    The Web Network: ASPit | PHPit | WebDev-Articles
    Blogs: TalkFones | Holidayzer | PHPit Blog

  6. #6
    SitePoint Wizard davidjmedlock's Avatar
    Join Date
    Dec 2002
    Location
    Nashville, TN USA
    Posts
    1,688
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Personally, I tend to find it quite frustrating when I'm searching the web for a product review or a product and all I get is Amazon and its clones... Maybe Amazon didn't have any reviews for the product, or maybe I'm looking for a different perspective from what I get there.

    But, its not just AWS sites. It's SE spam in general. I was looking for financial/accounting software the other day and typed it into Google and came up with tons of results, all of which looked like spam to me.

    I think this has resulted in part from webmasters looking at Google (or any Search Engine) as an advertising medium rather than a tool for users to find legitimate information. Yes, Google is great for advertising, and no, you shouldn't stop trying to get to the top for your search results. But, SE spamming has made Google essentially worthless to me in many cases...

  7. #7
    SitePoint Addict
    Join Date
    Dec 2001
    Location
    Wisconsin, USA
    Posts
    326
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I don't see a problem with affiliate sites being listed. As long as the main site is listed higher up, that's totally fair.
    5:4 Automated Traffic Exchange for Content Websites
    http://www.FunPageExchange.com/webmaster.php

  8. #8
    SitePoint Addict jtresidder's Avatar
    Join Date
    Nov 2003
    Location
    Southampton, UK
    Posts
    345
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by JonPKibble
    I don't see a problem with affiliate sites being listed. As long as the main site is listed higher up, that's totally fair.
    I wouldn't even want the parent site to be automatically higher up: If an affiliate site has put extra content around the affiliate links to draw in viewers, it could well fit my search better and save me some filtering. It could be easier to educate Joe Public about refining searches than to remove this kind of duplication.

  9. #9
    SitePoint Wizard Anat's Avatar
    Join Date
    Oct 2000
    Posts
    1,281
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Good question Chris.

    To add to it - what about writers simply publishing their unique original article in more than one website? I have professional writers contributing articles to TheCatSite.com on a non-exclusive basis. They can and do publish the very same article on their own website or elsewhere. Some of the articles have been printed in magazines and may appear in the online version of the magazines as well. That certainly is duplicate content - but I don't think anyone should be punished for it.

    Question is - does Google in fact penalize anyone for the duplicate content?
    My Web Publishing Blog: B6S.net - I dofollow but don't spam!
    Follow me on Twitter
    My favorite content writer:
    Steve Snedeker

  10. #10
    Non-Member
    Join Date
    Nov 2002
    Location
    Earth
    Posts
    1,107
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Seems like if you're an ecommerce site, setting up an affiliate program and datafeed would be vastly more worthwhile than making your own multiple (spam) sites.

  11. #11
    Technical Director at StuckOn JakeCop's Avatar
    Join Date
    Apr 2004
    Location
    Cheshire, UK
    Posts
    765
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by csn
    Seems like if you're an ecommerce site, setting up an affiliate program and datafeed would be vastly more worthwhile than making your own multiple (spam) sites.
    More work though isn't it? Most of these spam sites are only in it for the short term, quick profit gains.

  12. #12
    SitePoint Guru momos's Avatar
    Join Date
    Apr 2004
    Location
    Belgium
    Posts
    919
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Also, where does duplication begin?

    I once got links to a search-engin in google...

  13. #13
    SitePoint Addict StephenBauer's Avatar
    Join Date
    Apr 2004
    Location
    USA
    Posts
    263
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Makes you wonder if an unbiased, non-pay-per-submit directory that is scrutinized like DMOZ may be more useful someday than a large SE like Google.

    Quote Originally Posted by davidjmedlock
    Personally, I tend to find it quite frustrating when I'm searching the web for a product review or a product and all I get is Amazon and its clones... Maybe Amazon didn't have any reviews for the product, or maybe I'm looking for a different perspective from what I get there.

    But, its not just AWS sites. It's SE spam in general. I was looking for financial/accounting software the other day and typed it into Google and came up with tons of results, all of which looked like spam to me.

    I think this has resulted in part from webmasters looking at Google (or any Search Engine) as an advertising medium rather than a tool for users to find legitimate information. Yes, Google is great for advertising, and no, you shouldn't stop trying to get to the top for your search results. But, SE spamming has made Google essentially worthless to me in many cases...
    Last edited by StephenBauer; May 26, 2004 at 11:00.

  14. #14
    SitePoint Addict
    Join Date
    Dec 2001
    Location
    Wisconsin, USA
    Posts
    326
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    If you think about it though, aren't department stores basically affiliates for the products they distribute, in one way or another?

    You will find the same CD's at WalMart, Kmart, FYE, Sam Goody, etc.... you'll find the same food items at most groceries, etc... but does it make the stores any less valid?
    5:4 Automated Traffic Exchange for Content Websites
    http://www.FunPageExchange.com/webmaster.php

  15. #15
    SitePoint Enthusiast
    Join Date
    Feb 2002
    Posts
    35
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Recently it has been very difficult to find reviews or comments on products. All you get in google is SE spam. And that's the real problem. To find relevant content. One way is avoiding duplicates.

    It's not a very good method but, what's the alternative? A much more complex discrimination algorithm that borders on an Artificial intelligence?

    Maybe search engines will be the force that drives the research for better and more intelligent algorithms.

  16. #16
    Aussie Icon ozgression's Avatar
    Join Date
    Jul 2002
    Location
    Australia
    Posts
    839
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I, too, am concerned about the growing amount of datafeed fed "clones" taking up search results.

    I dont think that duplicate content is neccessarily a bad thing (think of all the press releases and news articles that can be found on many sites). I think the issue is just SPAM search results in general.

    In reality, Amazon's product page should be listed before that of the "clone site".

  17. #17
    SitePoint Zealot
    Join Date
    Mar 2003
    Location
    Dublin, Ireland
    Posts
    121
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    This really is a very interesting issue - I am hoping to recreate certain sections of the amazon.com web site on my own web site over the Summer - all things being well - using their datafeed.

    I have to say I appreciate Google's quandry.

    However why should any original content on my site be penalised as a result?

    For example...say half my site is original and half is duplicate stuff. Is it fair that the whole site gets penalised by google?

    Not sure on that one myself and it will be interesting to see how it pans out.

    Also I am actually changing the domain name on my web site which I am of course entitled to do. However in the change-over process, there will be two effectively duplicate sites.

    The mind boggles! Once you start down this road it is hard to imagine where it will end!!

    I note that Sitepoint also allows datafeeds of the forums. I can't see why web designers etc that contribute to sitepoint can't claim some benefit without running the risk of been penalised in google.

    Also how are google going to monitor this!! I note a larger and larger number of merchants at cj.com are offering datafeeds.

    So will all datafeeds be equal, but some datafeeds be more equal than others?

    ...!!....
    domainer111's coalbucket.com - welcome to web one and a bit!

  18. #18
    SitePoint Guru momos's Avatar
    Join Date
    Apr 2004
    Location
    Belgium
    Posts
    919
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by JonPKibble
    If you think about it though, aren't department stores basically affiliates for the products they distribute, in one way or another?

    You will find the same CD's at WalMart, Kmart, FYE, Sam Goody, etc.... you'll find the same food items at most groceries, etc... but does it make the stores any less valid?
    Well, but shouldn't there be any stores with only biological fresh food, and items that don't harm nature? In other words, shouldn't there be a light version of eg Google, that only picks the best parts?

  19. #19
    SitePoint Guru momos's Avatar
    Join Date
    Apr 2004
    Location
    Belgium
    Posts
    919
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by domainer111
    Also I am actually changing the domain name on my web site which I am of course entitled to do. However in the change-over process, there will be two effectively duplicate sites.
    You shouldn't duplicate your site, just link your old site through to your new site

  20. #20
    Bananas contain Zinc fonzerelli_79's Avatar
    Join Date
    Oct 2001
    Location
    Scotland
    Posts
    816
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by momos
    You shouldn't duplicate your site, just link your old site through to your new site
    yeah but what if the sites are the same except for the url?
    it wouldnt be in your best interests to links then would it

  21. #21
    SitePoint Enthusiast Crooner's Avatar
    Join Date
    Apr 2001
    Location
    Chambersburg, PA
    Posts
    33
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I think we give Google credit for being smarter than they actually are. I haven't seen any evidence that they can even determine what is duplicate content much less penalizing any sites for it. I see keyword spamming, hidden links and many other so called 'forbidden' techniques used on a number of first page google searches.

    I've used Google since the days when only webmasters and college geeks knew about and used it. In the last year and a half the search quality has decreased to the point of uselessness. On a typical search I see 4-5 other search engines come up on the first page. Where's the search quality in that? If I want to use another search engine I'll go there first. If Google really cared about search quality they could surely block these listings.
    Dean-Martin.com Managed IT Solutions
    eLiveHost.com Premium Hosting
    4StateShopper.com Community Forums

  22. #22
    SitePoint Zealot moagw's Avatar
    Join Date
    Nov 2003
    Location
    Kentucky, USA
    Posts
    188
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    If google doesn't reduce the SE spam that it has through whatever means it has, it will stop being the driving force behind web searches. If that happens it won't matter what your Google search result ends up cause the masses will have moved on to Yahoo, or whatever. I don't think it is good or bad, just a necessary thing for a company that did good as one of the big players but now that they stand out so much, they have much more to lose. They have to be "leaner" in search results or none of us would use them anymore.. but I think that has been touched on before in this thread (google's innefectiveness)

  23. #23
    SitePoint Guru
    Join Date
    May 2004
    Location
    santa rosa, ca
    Posts
    969
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    When I want to ask a question or find information about a product, I usually add the word forum to the end of my search. This way I can post my question on a forum and get the information I need.
    nondenominational, noncommercial, nonprofit,
    listener-supported, 24-hour, Christian ministry:
    Listen Live Online

  24. #24
    Makin' It Happen bgray's Avatar
    Join Date
    Jan 2003
    Location
    Texas
    Posts
    504
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I've asked myself this question many times.

    If you think about it Google using DMOZ as their thier directory seems to be the essence of duplicate content.
    I run an insurance company directory at InsuranceCompanies.net

  25. #25
    SitePoint Evangelist micmol's Avatar
    Join Date
    Apr 2002
    Location
    melbourne australia
    Posts
    488
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Crooner wrote: In the last year and a half the search quality has decreased to the point of uselessness.
    I'm with you ... and this is a major concern.

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •