Go Back   SitePoint Forums > Forum Index > Manage Your Site > Search Engine Optimization
Newsletter FAQ Members List Calendar Mark Forums Read

New to SitePoint Forums? Register here for free!

SitePoint Sponsor
 
 
 
Thread Tools Display Modes
Prev Previous Post   Next Post Next
Old May 6, 2004, 07:56   #1
aspen
Serial Publisher
silver trophy
 
aspen's Avatar
 
Join Date: Aug 1999
Location: East Lansing, MI USA
Posts: 13,285
What is duplicate content?

Lately it seems there has been an increase in datafeed driven/affiliate content sites out there. I myself have made quite a few. I have also seen the issue of what exactly is duplicate content discussed a few times recently.

We all know Google says that duplicate content is a "don't" and as such you risk being banned or penalized for doing it. But what exactly is duplicate content? It isn't just affiliate datafeed sites, such as those using Amazon AWS, that have duplicate content. People often create sites using feeds from Wikipedia and DMOZ, is this duplicate content? You could find a press release from Tivo on thousands of news, financial, or electronics websites. Is that duplicate content? What about game cheat sites that all list the same cheats?

I think we can all agree that when a single individual or business owns two websites with the exact same content that it is spam. But what about the thousands of websites owned by different people that all use the same content? Amazon AWS (Amazon Web Services) sites are not unique, they only offer affiliate content, and thus it'd seem Google would like to get rid of them in favor of listings for Amazon.com. In this situation it is easy to figure out who should get listed because there is a parent company everyone is affiliates with.

What about game cheat sites though? If you wanted to get rid of all the duplicate content how do you decide which one stays? DMOZ editors have faced this issue for a long time. You have two sites with the same content, which one is listed? My solution when I was an editor was to list them both, the reason is that maybe one site might be down when a user tries to visit it, so a certain amount of redundancy makes the directory more useful.

New datafeed enabled affiliate programs show up every day, as do new datafeed driven websites. Eventually there will be too many, search engines will have to do something, but what? There will be too many for manual review, and any automatic system could hurt other sites with duplicate content such as news sites and game cheat sites, etc. You might be able to write an algorithm that detects most Amazon AWS sites, but what about the thousands of other affiliate programs out there? And even then you're still just getting most of the websites. People will find a way around any filters.

Last edited by aspen; Sep 18, 2004 at 08:53.
aspen is offline   Reply With Quote
 

Bookmarks

« Previous Thread | Next Thread »

Thread Tools
Display Modes

 
Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Sponsored Links
 
Forum Jump


All times are GMT -7. The time now is 02:21.


Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.
Copyright 1998-2009, SitePoint Pty Ltd. All Rights Reserved