Greetings,
I would like to block my entire site except one single file. How can I do this?
Thanks
Greetings,
I would like to block my entire site except one single file. How can I do this?
Thanks
Possibly something like this:
User-agent: *
Allow: /MyPublicFile.htm
Disallow: /
But to be sure, it might be worth using one of the many free robots.txt generators that are available. A Google search should throw up a few.
Mike
Unfortunately, Mikl’s solution won’t work, as “Allow” is not recognised in a robots.txt file. You can find full information [URL=“http://www.robotstxt.org/robotstxt.html”]here:
but the relevant bit is
To exclude all files except one
This is currently a bit awkward, as there is no “Allow” field. The easy way is to put all files to be disallowed into a separate directory, say “stuff”, and leave the one file in the level above this directory:
User-agent: *
Disallow: /~joe/stuff/
Thanks for correcting my post, TechnoBear. I understood that Google and a few other leading spiders did understand Allow, but I wasn’t sure. You solution is obviously better.
Mike
Thanks for the response. Unfortunatly, I would like to block the root directory, something like this:
User-agent: *
Disallow: /
Allow: /MyPublicFile.htm
Is there any other way to do this?
Thanks
Kind regards
@Mikl says that “Allow” is understood by Google, although it’s not part of the general standard. If you want to take a chance that other search engines will also understand it, then you can go ahead and try it. Personally, I stick with the conventions. If you do decide to try it, you might be better to use Mikl’s approach and put the “Allow” field first; otherwise, the user-agents may just read Disallow: / , which effectively disallows everything, and move on.
I found the information about Allow: here:
Block or remove pages using a robots.txt file
But that reference is specific to Google. I think it would be better to go with TechnoBear’s original suggestion, which should work with all well-behaved robots.
Another possibility would be to not use robots.txt at all, but to place the following meta tag in the <head> of every page except the one you want to allow:
<meta name="robots" content="noindex">
If your pages are based on a template, that would be fairly easy to achieve (just insert the above line in the template), and it should be safer as doesn’t rely on any bot-specific syntax.
Mike
Use the robots.txt as proven above, out of curiosity why are blocking of your entire site other than one remaining page?