I’ve been following our Google stats in Webmaster Tools, specifically the HTML Suggestions section, and found that we have over 1k duplicate title tag pages - this is an e-commerce site built in Ruby with dynamic pages. The errors are made up mostly of 2 pages for the same content, one with a trailing slash and one without. Other examples include a link to a product page where the category path contains two words - and Google is showing the version with a space and the second instance with no space. That’s a an issue we fixed a few months ago.
The strange thing is, I’m not sure how Google is finding these pages since we can’t find them by clicking through. We made sure the code is set to only display the page without the trailing slash, both on the site and in the sitemap xml file. We also are using canonical tags in the code as a safety measure. We obviously don’t want to get penalized for duplicate content. But even with all of our efforts to remedy the problem these pages are still showing up in the suggestions list. The number keeps fluctuating up and down but hovers around 900 pages or so.
Will this hurt our ranking even though we’re using canonical tags? How can Google find those pages if we can’t find them on the site by browsing? Could those pages be from past crawls several months ago before we fixed the problem? Is it possible Google just hasn’t gone through and flushed them from their results?