Go Back   SitePoint Forums > Forum Index > Manage Your Site > Search Engine Optimization
Newsletter FAQ Members List Calendar Mark Forums Read

New to SitePoint Forums? Register here for free!

SitePoint Sponsor
 
Reply
 
Thread Tools Display Modes
Old May 6, 2004, 07:56   #1
aspen
Serial Publisher
 
aspen's Avatar
 
Join Date: Aug 1999
Location: East Lansing, MI USA
Posts: 13,283
What is duplicate content?

Lately it seems there has been an increase in datafeed driven/affiliate content sites out there. I myself have made quite a few. I have also seen the issue of what exactly is duplicate content discussed a few times recently.

We all know Google says that duplicate content is a "don't" and as such you risk being banned or penalized for doing it. But what exactly is duplicate content? It isn't just affiliate datafeed sites, such as those using Amazon AWS, that have duplicate content. People often create sites using feeds from Wikipedia and DMOZ, is this duplicate content? You could find a press release from Tivo on thousands of news, financial, or electronics websites. Is that duplicate content? What about game cheat sites that all list the same cheats?

I think we can all agree that when a single individual or business owns two websites with the exact same content that it is spam. But what about the thousands of websites owned by different people that all use the same content? Amazon AWS (Amazon Web Services) sites are not unique, they only offer affiliate content, and thus it'd seem Google would like to get rid of them in favor of listings for Amazon.com. In this situation it is easy to figure out who should get listed because there is a parent company everyone is affiliates with.

What about game cheat sites though? If you wanted to get rid of all the duplicate content how do you decide which one stays? DMOZ editors have faced this issue for a long time. You have two sites with the same content, which one is listed? My solution when I was an editor was to list them both, the reason is that maybe one site might be down when a user tries to visit it, so a certain amount of redundancy makes the directory more useful.

New datafeed enabled affiliate programs show up every day, as do new datafeed driven websites. Eventually there will be too many, search engines will have to do something, but what? There will be too many for manual review, and any automatic system could hurt other sites with duplicate content such as news sites and game cheat sites, etc. You might be able to write an algorithm that detects most Amazon AWS sites, but what about the thousands of other affiliate programs out there? And even then you're still just getting most of the websites. People will find a way around any filters.

Last edited by aspen; Sep 18, 2004 at 08:53.
aspen is offline   Reply With Quote
Old May 6, 2004, 08:16   #2
Golgotha
runat="server"
 
Golgotha's Avatar
 
Join Date: Nov 2001
Location: Colorado
Posts: 2,113
yeah, it's quite the quadary they have. With RSS feeds and blogs spreading like wildfire you can see why they feel they have to do something- but what is a good question.

TemplateMonster will certainly take a hit...

"People will find a way around any filters." Yep, you hit the nail of the head there...
Golgotha is offline   Reply With Quote
Old May 6, 2004, 08:21   #3
Golgotha
runat="server"
 
Golgotha's Avatar
 
Join Date: Nov 2001
Location: Colorado
Posts: 2,113
It's not like television sees this as a problem...You get home flip on CNN and get the latest news scoop...Then later your on MSNBC only to get the same scoop...followed up by your local news giving the same scoop...

Maybe, Google should start their own TV channel?
Hey, if Mark Cuban can, why not Google?
Golgotha is offline   Reply With Quote
Old May 6, 2004, 08:48   #4
SimonMc
SitePoint Guru
 
Join Date: Apr 2002
Location: The Office
Posts: 675
Quote:
Originally Posted by Golgotha
It's not like television sees this as a problem...You get home flip on CNN and get the latest news scoop...Then later your on MSNBC only to get the same scoop...followed up by your local news giving the same scoop...

Maybe, Google should start their own TV channel?
Hey, if Mark Cuban can, why not Google?
I think that google has appointed itself the thought control of the internet. This is spam..that is spam..etc etc. This is right...this is wrong...etc. Many of these practices that google bans were around before the major search engines. It is googles job as they say to provide relevant search results. If I do a search for a book...then I would expect google to return the most relevant result for that. However...If I am interested in the subject of the book as well..I would not mind google pulling up an afiliate site about the subject. How can google tell what I am thinking?

I think that major affiliate programs when integrated as part of a larger topic add value. When an affiliate site is just an affiliate site it adds nothing.

If google can look at the bigger picture then I think that well integrated affiliate sites will be OK. It is the ones that rely entirely on AWS or feeds and have nothing of their own to offer will suffer.

Just my opinion.

Simon
SimonMc is offline   Reply With Quote
Old May 6, 2004, 09:07   #5
Kings
What a twist!
 
Kings's Avatar
 
Join Date: Jul 2002
Location: The Netherlands
Posts: 1,031
I run a lyrics website, with approx. 30,000 lyrics at the moment. I'm fairly sure that most of my lyrics are also available on other websites. So, does that mean I've got duplicate content or not? It's nothing from a datafeed, or affiliate, but nonetheless still 'duplicate'.

I have another website with a lot of Wikipedia articles. Duplicate content or not? Other articles on the website are unique. So how should Google handle this? Drop the whole website, or only drop the duplicate content?

This isn't going to be easy for Google if they're seriously looking at cracking down on duplicate content. Who decides what duplicate content is, and better yet, why should certain types of duplicate content not be indexed?
Kings is offline   Reply With Quote
Old May 6, 2004, 09:30   #6
davidjmedlock
SitePoint Wizard
 
davidjmedlock's Avatar
 
Join Date: Dec 2002
Location: Nashville, TN USA
Posts: 2,039
Personally, I tend to find it quite frustrating when I'm searching the web for a product review or a product and all I get is Amazon and its clones... Maybe Amazon didn't have any reviews for the product, or maybe I'm looking for a different perspective from what I get there.

But, its not just AWS sites. It's SE spam in general. I was looking for financial/accounting software the other day and typed it into Google and came up with tons of results, all of which looked like spam to me.

I think this has resulted in part from webmasters looking at Google (or any Search Engine) as an advertising medium rather than a tool for users to find legitimate information. Yes, Google is great for advertising, and no, you shouldn't stop trying to get to the top for your search results. But, SE spamming has made Google essentially worthless to me in many cases...
davidjmedlock is offline   Reply With Quote
Old May 6, 2004, 19:57   #7
JonPKibble
SitePoint Addict
 
Join Date: Dec 2001
Location: Wisconsin, USA
Posts: 329
I don't see a problem with affiliate sites being listed. As long as the main site is listed higher up, that's totally fair.
JonPKibble is offline   Reply With Quote
Old May 6, 2004, 23:51   #8
jtresidder
SitePoint Addict
 
jtresidder's Avatar
 
Join Date: Nov 2003
Location: Southampton, UK
Posts: 371
Quote:
Originally Posted by JonPKibble
I don't see a problem with affiliate sites being listed. As long as the main site is listed higher up, that's totally fair.
I wouldn't even want the parent site to be automatically higher up: If an affiliate site has put extra content around the affiliate links to draw in viewers, it could well fit my search better and save me some filtering. It could be easier to educate Joe Public about refining searches than to remove this kind of duplication.
jtresidder is offline   Reply With Quote
Old May 7, 2004, 01:12   #9
Anat
SitePoint Wizard
 
Anat's Avatar
 
Join Date: Oct 2000
Posts: 1,709
Good question Chris.

To add to it - what about writers simply publishing their unique original article in more than one website? I have professional writers contributing articles to TheCatSite.com on a non-exclusive basis. They can and do publish the very same article on their own website or elsewhere. Some of the articles have been printed in magazines and may appear in the online version of the magazines as well. That certainly is duplicate content - but I don't think anyone should be punished for it.

Question is - does Google in fact penalize anyone for the duplicate content?
Anat is offline   Reply With Quote
Old May 7, 2004, 01:39   #10
csn
Non-Member
 
Join Date: Nov 2002
Location: Earth
Posts: 1,107
Seems like if you're an ecommerce site, setting up an affiliate program and datafeed would be vastly more worthwhile than making your own multiple (spam) sites.
csn is offline   Reply With Quote
Old May 7, 2004, 02:01   #11
JakeCop
Impregnator of women
 
JakeCop's Avatar
 
Join Date: Apr 2004
Location: Manchester, UK
Posts: 749
Quote:
Originally Posted by csn
Seems like if you're an ecommerce site, setting up an affiliate program and datafeed would be vastly more worthwhile than making your own multiple (spam) sites.
More work though isn't it? Most of these spam sites are only in it for the short term, quick profit gains.
JakeCop is offline   Reply With Quote
Old May 7, 2004, 03:32   #12
momos
SitePoint Guru
 
momos's Avatar
 
Join Date: Apr 2004
Location: Belgium
Posts: 919
Also, where does duplication begin?

I once got links to a search-engin in google...
momos is offline   Reply With Quote
Old May 7, 2004, 10:17   #13
StephenBauer
SitePoint Addict
 
StephenBauer's Avatar
 
Join Date: Apr 2004
Location: USA
Posts: 266
Makes you wonder if an unbiased, non-pay-per-submit directory that is scrutinized like DMOZ may be more useful someday than a large SE like Google.

Quote:
Originally Posted by davidjmedlock
Personally, I tend to find it quite frustrating when I'm searching the web for a product review or a product and all I get is Amazon and its clones... Maybe Amazon didn't have any reviews for the product, or maybe I'm looking for a different perspective from what I get there.

But, its not just AWS sites. It's SE spam in general. I was looking for financial/accounting software the other day and typed it into Google and came up with tons of results, all of which looked like spam to me.

I think this has resulted in part from webmasters looking at Google (or any Search Engine) as an advertising medium rather than a tool for users to find legitimate information. Yes, Google is great for advertising, and no, you shouldn't stop trying to get to the top for your search results. But, SE spamming has made Google essentially worthless to me in many cases...

Last edited by StephenBauer; May 26, 2004 at 12:00.
StephenBauer is offline   Reply With Quote
Old May 7, 2004, 10:20   #14
JonPKibble
SitePoint Addict
 
Join Date: Dec 2001
Location: Wisconsin, USA
Posts: 329
If you think about it though, aren't department stores basically affiliates for the products they distribute, in one way or another?

You will find the same CD's at WalMart, Kmart, FYE, Sam Goody, etc.... you'll find the same food items at most groceries, etc... but does it make the stores any less valid?
JonPKibble is offline   Reply With Quote
Old May 7, 2004, 11:37   #15
cuau
SitePoint Enthusiast
 
Join Date: Feb 2002
Posts: 35
Recently it has been very difficult to find reviews or comments on products. All you get in google is SE spam. And that's the real problem. To find relevant content. One way is avoiding duplicates.

It's not a very good method but, what's the alternative? A much more complex discrimination algorithm that borders on an Artificial intelligence?

Maybe search engines will be the force that drives the research for better and more intelligent algorithms.
cuau is offline   Reply With Quote
Old May 8, 2004, 19:57   #16
ozgression
Aussie Icon
 
ozgression's Avatar
 
Join Date: Jul 2002
Location: Australia
Posts: 1,079
I, too, am concerned about the growing amount of datafeed fed "clones" taking up search results.

I dont think that duplicate content is neccessarily a bad thing (think of all the press releases and news articles that can be found on many sites). I think the issue is just SPAM search results in general.

In reality, Amazon's product page should be listed before that of the "clone site".
ozgression is offline   Reply With Quote
Old May 10, 2004, 10:05   #17
domainer111
SitePoint Zealot
 
Join Date: Mar 2003
Location: Dublin, Ireland
Posts: 133
This really is a very interesting issue - I am hoping to recreate certain sections of the amazon.com web site on my own web site over the Summer - all things being well - using their datafeed.

I have to say I appreciate Google's quandry.

However why should any original content on my site be penalised as a result?

For example...say half my site is original and half is duplicate stuff. Is it fair that the whole site gets penalised by google?

Not sure on that one myself and it will be interesting to see how it pans out.

Also I am actually changing the domain name on my web site which I am of course entitled to do. However in the change-over process, there will be two effectively duplicate sites.

The mind boggles! Once you start down this road it is hard to imagine where it will end!!

I note that Sitepoint also allows datafeeds of the forums. I can't see why web designers etc that contribute to sitepoint can't claim some benefit without running the risk of been penalised in google.

Also how are google going to monitor this!! I note a larger and larger number of merchants at cj.com are offering datafeeds.

So will all datafeeds be equal, but some datafeeds be more equal than others?

...!!....
domainer111 is offline   Reply With Quote
Old May 10, 2004, 23:48   #18
momos
SitePoint Guru
 
momos's Avatar
 
Join Date: Apr 2004
Location: Belgium
Posts: 919
Quote:
Originally Posted by JonPKibble
If you think about it though, aren't department stores basically affiliates for the products they distribute, in one way or another?

You will find the same CD's at WalMart, Kmart, FYE, Sam Goody, etc.... you'll find the same food items at most groceries, etc... but does it make the stores any less valid?
Well, but shouldn't there be any stores with only biological fresh food, and items that don't harm nature? In other words, shouldn't there be a light version of eg Google, that only picks the best parts?
momos is offline   Reply With Quote
Old May 10, 2004, 23:52   #19
momos
SitePoint Guru
 
momos's Avatar
 
Join Date: Apr 2004
Location: Belgium
Posts: 919
Quote:
Originally Posted by domainer111
Also I am actually changing the domain name on my web site which I am of course entitled to do. However in the change-over process, there will be two effectively duplicate sites.
You shouldn't duplicate your site, just link your old site through to your new site
momos is offline   Reply With Quote
Old May 11, 2004, 15:46   #20
fonzerelli_79
Bananas contain Zinc
 
fonzerelli_79's Avatar
 
Join Date: Oct 2001
Location: Scotland
Posts: 1,175
Quote:
Originally Posted by momos
You shouldn't duplicate your site, just link your old site through to your new site
yeah but what if the sites are the same except for the url?
it wouldnt be in your best interests to links then would it
fonzerelli_79 is offline   Reply With Quote
Old May 19, 2004, 14:18   #21
Crooner
SitePoint Enthusiast
 
Crooner's Avatar
 
Join Date: Apr 2001
Location: Chambersburg, PA
Posts: 37
I think we give Google credit for being smarter than they actually are. I haven't seen any evidence that they can even determine what is duplicate content much less penalizing any sites for it. I see keyword spamming, hidden links and many other so called 'forbidden' techniques used on a number of first page google searches.

I've used Google since the days when only webmasters and college geeks knew about and used it. In the last year and a half the search quality has decreased to the point of uselessness. On a typical search I see 4-5 other search engines come up on the first page. Where's the search quality in that? If I want to use another search engine I'll go there first. If Google really cared about search quality they could surely block these listings.
Crooner is offline   Reply With Quote
Old May 19, 2004, 14:52   #22
moagw
SitePoint Zealot
 
moagw's Avatar
 
Join Date: Nov 2003
Location: Kentucky, USA
Posts: 191
If google doesn't reduce the SE spam that it has through whatever means it has, it will stop being the driving force behind web searches. If that happens it won't matter what your Google search result ends up cause the masses will have moved on to Yahoo, or whatever. I don't think it is good or bad, just a necessary thing for a company that did good as one of the big players but now that they stand out so much, they have much more to lose. They have to be "leaner" in search results or none of us would use them anymore.. but I think that has been touched on before in this thread (google's innefectiveness)
moagw is offline   Reply With Quote
Old Jul 16, 2004, 16:51   #23
SantaRosaDesign
SitePoint Wizard
 
Join Date: May 2004
Location: santa rosa, ca
Posts: 1,055
When I want to ask a question or find information about a product, I usually add the word forum to the end of my search. This way I can post my question on a forum and get the information I need.
SantaRosaDesign is offline   Reply With Quote
Old Sep 24, 2004, 15:05   #24
bgray
Makin' It Happen
 
bgray's Avatar
 
Join Date: Jan 2003
Location: Texas
Posts: 608
I've asked myself this question many times.

If you think about it Google using DMOZ as their thier directory seems to be the essence of duplicate content.
bgray is offline   Reply With Quote
Old Sep 26, 2004, 22:57   #25
micmol
SitePoint Guru
 
micmol's Avatar
 
Join Date: Apr 2002
Location: melbourne australia
Posts: 669
Quote:
Crooner wrote: In the last year and a half the search quality has decreased to the point of uselessness.
I'm with you ... and this is a major concern.
micmol is offline   Reply With Quote
Reply

Bookmarks

« Previous Thread | Next Thread »

Thread Tools
Display Modes

 
Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Sponsored Links
 
Forum Jump


All times are GMT -7. The time now is 02:17.


Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
Copyright 1998-2009, SitePoint Pty Ltd. All Rights Reserved