Google’s Hidden Protocol

By | | Search Engine Marketing

Google’s URL removal page contains a little bit of handy information that’s not found on their webmaster info pages where it should be.

Google supports the use of “wildcards” in robots.txt files. This isn’t part of the original 1994 robots.txt protocol, and as far as I know, is not supported by other search engines. To make it work, you need to add a separate section for Googlebot in your robots.txt file. An example:

User-agent: Googlebot
Disallow: /*sort=

This would stop Googlebot from reading any URL that included the string “sort=” no matter where that string occurs in the URL.

So if you have a shopping cart, and use a variable called “sort” in some URLs, you can stop Googlebot from reading the sorted (but basically duplicate) content that your site produces for users.

Every search engine should support this. It would make real life a lot easier for folks with dynamic sites, and artificial life a lot easier for spiders.

Written By:

Dan Thies

Dan has been helping his clients (and friends) promote their Websites, though his company SEO Research Labs, since 1996. Dan is the author of SitePoint's The Search Engine Marketing Kit and also maintains Key Words: SitePoint's Search Engine Marketing blog.

Website
>> More Posts By Dan Thies

 

{ 9 comments }

rapidvectorseo November 14, 2008 at 9:54 pm

I did this once and at page removal request i got an error for the same.
SEO Services

Anonymously January 26, 2007 at 5:13 am

That helps a lot. Thanks for the help!

maxy22 December 6, 2005 at 12:33 pm

If there is any url in my site containing the word ‘calender’ and I don’t want google to index it than I wil just add

User-agent: Googlebot
Disallow: /*calender

to my robots file,

and it does not matter that under which directory the url with word ‘calender’ is coming, it might be my cg-bin directory.

right ?

Octal October 24, 2005 at 7:22 am

I’ve been using wildcards in robots.txt for…well ever. I had no idea it wasn’t part of the original protocol and I certainly had no idea it was only Google that supports it. Thanks for the info.

DanThies October 23, 2005 at 8:34 pm

It could be. Picture a category with one product – no matter how you sort it, it’s the same page. Even with a bunch of items, do you really need the search engines to have every possible order?

Pura Vida October 23, 2005 at 7:17 pm

Is this a must add for user using sites like OSCommerce? Is the Sort feature a real cause for duplicate content?

ronald_poi October 22, 2005 at 7:29 pm

useful information. thanks!

Ogito October 22, 2005 at 4:52 pm

Thanks Dan

peach October 21, 2005 at 3:55 pm

great find! thanx

Comments on this entry are closed.

{ 3 trackbacks }