SitePoint Sponsor

User Tag List

Results 1 to 8 of 8
  1. #1
    SitePoint Addict
    Join Date
    Jul 2007
    Posts
    233
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    robots.txt - Block all pages except one

    Greetings,

    I would like to block my entire site except one single file. How can I do this?

    Thanks

  2. #2
    SitePoint Mentor silver trophybronze trophy
    Mikl's Avatar
    Join Date
    Dec 2011
    Location
    Edinburgh, Scotland
    Posts
    1,541
    Mentioned
    63 Post(s)
    Tagged
    0 Thread(s)
    Possibly something like this:


    Code:
    User-agent: *
    Allow: /MyPublicFile.htm
    Disallow: /
    But to be sure, it might be worth using one of the many free robots.txt generators that are available. A Google search should throw up a few.

    Mike

  3. #3
    Life is not a malfunction gold trophysilver trophybronze trophy
    TechnoBear's Avatar
    Join Date
    Jun 2011
    Location
    Argyll, Scotland
    Posts
    6,088
    Mentioned
    256 Post(s)
    Tagged
    5 Thread(s)
    Unfortunately, Mikl's solution won't work, as "Allow" is not recognised in a robots.txt file. You can find full information here:
    but the relevant bit is
    To exclude all files except one
    This is currently a bit awkward, as there is no "Allow" field. The easy way is to put all files to be disallowed into a separate directory, say "stuff", and leave the one file in the level above this directory:
    Code:
    User-agent: *
    Disallow: /~joe/stuff/

  4. #4
    SitePoint Mentor silver trophybronze trophy
    Mikl's Avatar
    Join Date
    Dec 2011
    Location
    Edinburgh, Scotland
    Posts
    1,541
    Mentioned
    63 Post(s)
    Tagged
    0 Thread(s)
    Thanks for correcting my post, TechnoBear. I understood that Google and a few other leading spiders did understand Allow, but I wasn't sure. You solution is obviously better.

    Mike

  5. #5
    SitePoint Addict
    Join Date
    Jul 2007
    Posts
    233
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks for the response. Unfortunatly, I would like to block the root directory, something like this:

    User-agent: *
    Disallow: /
    Allow: /MyPublicFile.htm

    Is there any other way to do this?

    Thanks
    Kind regards

  6. #6
    Life is not a malfunction gold trophysilver trophybronze trophy
    TechnoBear's Avatar
    Join Date
    Jun 2011
    Location
    Argyll, Scotland
    Posts
    6,088
    Mentioned
    256 Post(s)
    Tagged
    5 Thread(s)
    @Mikl says that "Allow" is understood by Google, although it's not part of the general standard. If you want to take a chance that other search engines will also understand it, then you can go ahead and try it. Personally, I stick with the conventions. If you do decide to try it, you might be better to use Mikl's approach and put the "Allow" field first; otherwise, the user-agents may just read Disallow: / , which effectively disallows everything, and move on.

  7. #7
    SitePoint Mentor silver trophybronze trophy
    Mikl's Avatar
    Join Date
    Dec 2011
    Location
    Edinburgh, Scotland
    Posts
    1,541
    Mentioned
    63 Post(s)
    Tagged
    0 Thread(s)
    I found the information about Allow: here:

    Block or remove pages using a robots.txt file

    But that reference is specific to Google. I think it would be better to go with TechnoBear's original suggestion, which should work with all well-behaved robots.

    Another possibility would be to not use robots.txt at all, but to place the following meta tag in the <head> of every page except the one you want to allow:

    Code:
    <meta name="robots" content="noindex">
    If your pages are based on a template, that would be fairly easy to achieve (just insert the above line in the template), and it should be safer as doesn't rely on any bot-specific syntax.

    Mike

  8. #8
    SitePoint Member
    Join Date
    Jan 2012
    Location
    England
    Posts
    5
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Use the robots.txt as proven above, out of curiosity why are blocking of your entire site other than one remaining page?


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •