Sanity check on a disallow statement please

I’ve been asked to look at a duplicate URL issue in Google’s indiex. I’m told that the duplicates are due to URLs with query strings that are, on investigation, related to an edit mode of our CMS. In other words, there should be no reason why any link is in the public with this info (but may be there because of a copy/paste error - someone’s copied a link in edit mode and pasted somewhere which has gone live).

I’ve asked for a scan of our links using Policy Tester (an IBM testing tool, which I believe is a revamp of Watchfire), but nothing is coming up as the source. So I’ve looked at adding a disallow to the robots.txt.

I’m very wary, as this is not my area of expertise at all and I do not want to take a risk.

Will this do the job of stopping all links, regardless of path, that have a query string starting wbc_purpose:

Disallow: /*?wbc_purpose=

Thoughts?

Much appreciated

Theoritically, yes that should work but Google removed their robots.txt info regarding this some months ago. You may want to consider using Google Webmaster Tools to handle the disallowing of parameters.

Some further reading on robots.txt doesn’t indicate that the asterisk is seen as a wildcard in the disallow field. That doesn’t mean it won’t work, just means that I couldn’t find any documentation on it. I believe that Google Webmaster Tools also allows you to test your robots.txt file. You may want to try the asterisk as a wildcard in there and see what the analysis tells you.

http://www.robotstxt.org/robotstxt.html