Google and Duplicate Content

My question is how to avoid being penalized by Google for “duplicate content”.

This relates to my Subsection page which lists hundreds of “article summaries”.

To make things more manageable, I added Sorting and Pagination, and so now things look like this…


www.debbie.com/finance/economy/?sortname=by-date&sortdir=desc&page=1
www.debbie.com/finance/economy/?sortname=by-date&sortdir=desc&page=2
www.debbie.com/finance/economy/?sortname=by-date&sortdir=desc&page=3
www.debbie.com/finance/economy/?sortname=by-date&sortdir=desc&page=4
www.debbie.com/finance/economy/?sortname=by-date&sortdir=desc&page=5

To address the “duplicate content” issue, I have taken these steps so far…

Step #1: Changed the URL from a directory structure style…


www.debbie.com/finance/economy/by-date/desc/5

to a URL with a Query String…


www.debbie.com/finance/economy/?sortname=by-date&sortdir=desc&page=5

Step #2: Used PHP to dynamically add in rel=“prev” and rel=“next” meta-tags to each page…


	<!-- Page Relationships -->
	<link rel='prev' href='http://local.debbie/finance/economy/?sortname=by-date&sortdir=desc&page=2'>
	<link rel='next' href='http://local.debbie/finance/economy/?sortname=by-date&sortdir=desc&page=4'>

So far, so good… :slight_smile:

However, where I am confused is this…

What needs to be done so that Google doesn’t penalize me when a page is similar (or the same) because of Sorting?

Originally, I was going to try and implement rel=canonical, but things get rather tricky when you consider these pages also include pagination!

According to Google’s 5 common mistakes with rel=canonical, you should NOT use rel=canonical on the first page of a paginated series.

As far as I can tell, using rel=canonical would cancel out using rel=“prev” and rel=“next” in my particular situation.

The best idea that I can come up with is to NOT use rel=canonical, but instead use Google’s Webmaster Tool thing, and define that I want the Googlebot to ignore the parameters sortname and sortdir.

What do you think?

Sincerely,

Debbie

Google is smart enough to see these sort of patterns, there’s good video that explains this at http://www.youtube.com/watch?v=mQZY7EmjbMA

Interesting video, but doesn’t address my issue…

Currently, I am using rel=“prev” and rel=“next” to tell Google to group my 5 paginated pages together, however there is still the issue of how to tell Google about the Sorting aspect…

As I see it, I have four groups when it comes to Sorting…

Group #1: By Date, Descending


www.debbie.com/finance/economy/?sortname=by-date&sortdir=desc&page=1
www.debbie.com/finance/economy/?sortname=by-date&sortdir=desc&page=2
www.debbie.com/finance/economy/?sortname=by-date&sortdir=desc&page=3
www.debbie.com/finance/economy/?sortname=by-date&sortdir=desc&page=4
www.debbie.com/finance/economy/?sortname=by-date&sortdir=desc&page=5

Group #2: By Date, Ascending


www.debbie.com/finance/economy/?sortname=by-date&sortdir=asc&page=1
www.debbie.com/finance/economy/?sortname=by-date&sortdir=asc&page=2
www.debbie.com/finance/economy/?sortname=by-date&sortdir=asc&page=3
www.debbie.com/finance/economy/?sortname=by-date&sortdir=asc&page=4
www.debbie.com/finance/economy/?sortname=by-date&sortdir=asc&page=5

Group #3: By Title, Descending


www.debbie.com/finance/economy/?sortname=by-title&sortdir=desc&page=1
www.debbie.com/finance/economy/?sortname=by-title&sortdir=desc&page=2
www.debbie.com/finance/economy/?sortname=by-title&sortdir=desc&page=3
www.debbie.com/finance/economy/?sortname=by-title&sortdir=desc&page=4
www.debbie.com/finance/economy/?sortname=by-title&sortdir=desc&page=5

Group #4: By Title, Ascending


www.debbie.com/finance/economy/?sortname=by-title&sortdir=asc&page=1
www.debbie.com/finance/economy/?sortname=by-title&sortdir=asc&page=2
www.debbie.com/finance/economy/?sortname=by-title&sortdir=asc&page=3
www.debbie.com/finance/economy/?sortname=by-title&sortdir=asc&page=4
www.debbie.com/finance/economy/?sortname=by-title&sortdir=asc&page=5

Let’s say that I want the “default view” to be By Date, Descending

If I add rel=“canonical” to Group #2, #3, and #4 then would that help or hurt me in Google’s eyes??


<link rel="canonical" href="http://www.debbie.com/finance/economy/?sortname=by-date&sortdir=desc&page=1 /> 

:-/

Anyone seen Matt Cutts around?! :lol:

Sincerely,

Debbie

Hi Debbie.

Do you have a website and is Google Webmaster Tools actually complaining? If not then I would wait and only act if and when warnings appear.

Your scenario looks as though duplicate <title> and also duplicate <meta content=“…”> will be warnings found by Google.

I would be tempted to use a “<META NAME=“ROBOTS” CONTENT=“NOINDEX”>” because the relevant pages have little “link-juice” and just a list of links to your articles.

All your articles should be included in your “sitemap.xml” which is essential not only for Google but also for other search engines.

just my two_cents

As far as I know rel=canonical works great but not with duplicate content. My bug was with Yoast few weeks ago. About your strategy have you got all your pages indexed? I am curious how this really went out. Please keep us posted.

Sincerely,
Matt:)

To avoid these Duplicate content issue, i suggest you to use “Canonical Tag”, and for other URL, which results in duplicate content, you can either block them using robots, or by using no-index, no-follow tag.

Or by using Sitelink section in Google webmaster tools, you can de-index the URL which create duplicate content issue.

Why don’t you block the URL variation occurring because of filters? I am using robots.txt filter to disallow all these URLs with this line - Disallow: /*?