Blog Post RSS ?

Blogs » Search Engine Marketing » Google’s Hidden Protocol
 

Google’s Hidden Protocol

by Dan Thies

Google’s URL removal page contains a little bit of handy information that’s not found on their webmaster info pages where it should be.

Google supports the use of “wildcards” in robots.txt files. This isn’t part of the original 1994 robots.txt protocol, and as far as I know, is not supported by other search engines. To make it work, you need to add a separate section for Googlebot in your robots.txt file. An example:

User-agent: Googlebot
Disallow: /*sort=

This would stop Googlebot from reading any URL that included the string “sort=” no matter where that string occurs in the URL.

So if you have a shopping cart, and use a variable called “sort” in some URLs, you can stop Googlebot from reading the sorted (but basically duplicate) content that your site produces for users.

Every search engine should support this. It would make real life a lot easier for folks with dynamic sites, and artificial life a lot easier for spiders.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • Ping.fm
  • Twitthis

Related posts:

  1. On-page SEO – The Google Way Mihaela recommends Google’s Search Engine Optimization Starter Guide for all...
  2. How Google Really Wants You to Optimize Your Site Google is "delighted" when sites are optimized for search --...
  3. Quick and Easy Graphing with the Google Chart API The Google Chart API makes creating simple charts and graphs...
  4. Microsoft Kumo Search to Take on Google. Again. Can Microsoft successfully take on Google with its new Kumo...
  5. Three Hidden Photoshop Pen Tool Tips Jennifer shows you three options and shortcuts that can help...

This post has 12 responses so far

Sponsored Links

SitePoint Marketplace

Buy and sell Websites, templates, domain names, hosting, graphics and more.

Follow SitePoint on...