Robots.txt Syntax Explanation

I’m not sure if this is widely supported, but I saw these in a robot.txt file and am unsure what it means, can someone explain?

/folder///?r=
/folder///*.html?r=

That looks like an attempt to use wildcard exclusions in their robots.txt.

I’m not sure whether that particular format would work, but here’s some more information:

Pattern matching

Googlebot (but not all search engines) respects some pattern matching.

* To match a sequence of characters, use an asterisk (*). For instance, to block access to all subdirectories that begin with private:

  User-agent: Googlebot
  Disallow: /private*/

* To block access to all URLs that include a question mark (?) (more specifically, any URL that begins with your domain name, followed by any string, followed by a question mark, followed by any string):

  User-agent: Googlebot
  Disallow: /*?

* To specify matching the end of a URL, use $. For instance, to block any URLs that end with .xls:

  User-agent: Googlebot 
  Disallow: /*.xls$

  You can use this pattern matching in combination with the Allow directive. For instance, if a ? indicates a session ID, you may want to exclude all URLs that contain them to ensure Googlebot doesn't crawl duplicate pages. But URLs that end with a ? may be the version of the page that you do want included. For this situation, you can set your robots.txt file as follows:

  User-agent: *
  Allow: /*?$
  Disallow: /*?

  The Disallow: / *? directive will block any URL that includes a ? (more specifically, it will block any URL that begins with your domain name, followed by any string, followed by a question mark, followed by any string).

  The Allow: /*?$ directive will allow any URL that ends in a ? (more specifically, it will allow any URL that begins with your domain name, followed by a string, followed by a ?, with no characters after the ?).

From http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449

In robot.txt file we used only / and * symbols to restrict crawl the sensitive pages of sites like admin,user’s account details, images folders and some other important folders or sub directories. We can also give the URLs within robot.txt file and normally sytax of we define within file like this:
User-Agent: *
Disallow: site-name

You can test your robots.txt file in Google webmaster tools to make sure its working ok.