Robots.txt disallow all subfolders with same name, regardless the parent directory

Hi There,

I have a review script that is producing duplicate content; the reviews appear on the product pages, but also in a separate folder within each product directory, e.g.:

site.com/category1/product1/
site.com/category1/product1/review
site.com/category2/product2/
site.com/category2/product2/review

What I’d like the tell the robots.tx file to do is to not follow any of these review folders. I can’t seem to find any guidance on how to do this, not even within google’s guidelines.

Any suggestions would be most welcome

Thanks,
Dan


User-agent: *
Disallow: /category1/product1/

When you add a folder/path to the disallow directive, subfolders are left alone by (legitimate) crawlers.

In the above example, anything inside the product1 folder (including the product1 folder) is not crawled. If you don’t want anything in the “category1” folder crawled, then just do this instead:


User-agent: *
Disallow: /category1/

And with additional folders:


User-agent: *
Disallow: /category1/
Disallow: /category2/