This dynamic site doesn’t have an spiderable search interface, but users link to their pages which reside in the subdirectory level, so these should be getting picked up by search engines, correct?
AFAIK some search bots don’t do “Allow” only “Disallow”.
So if that’s true, those bots will see the “disallow all” and not see the “allow” therefore not crawling them (if they bother with the robots.txt in the first place, I’ve had some bots that either don’t read them or ignore them, and I’ve heard some may even use it to find what you don’t want them to).
So it may be better to write the file with only "Disallow"s.
Rebirth Studios: If you are having problems, check the robots.txt specification. Google and the others follow the guidelines fairly strictly so you should be OK.
This is currently a bit awkward, as there is no “Allow” field. The easy way is to put all files to be disallowed into a separate directory, say “stuff”, and leave the one file in the level above this directory:
OK let me clear up some of the misgivings in this thread:
Yes some search engines don’t allow the “allow:” command… however Google, Bing, Yahoo and most of the big names do… therefore you can be sure that 99% of people will be able to find pages that have been triggered using it. The major players all support it therefore it’s not worth getting petty about, it’s perfectly legitimate to use. This also stands for the other non-standard components in Wikipedia and the second generation spec I posted.
As for your own, Rebirth Studios, it should work as you posted it. the allow directive gives explicit instructions that the path you specified is visible to spiders.