Hi forum,
I have got a question, but should be a piece of cake.
I just want to confirm with someone how do I block search engines (all of them) from a testing subdomain?
Lets say my domain is mydomain.com
My subdomain is test.mydomain.com
So, in my robots.txt file…
Should it be like this:
Agent: *
Disallow:/test/
Is the above the right htaccess command and should this be placed in the main public_html folder?
My subdomain redirects to a subdomain not a subdirectory…it redirects to http://test.mydomain.com
Thanks for any clarifications
Yes, your subdomain points to a folder in the main sites document_root folder.
public_html/test/
So, in your public_html folder create a robots.txt to include
User-agent: *
Disallow: /test/
Or, to specifically disallow a particular agent :
User-Agent: Googlebot
Disallow: /test/
Here is a good reference on robots.txt that you can bookmark.
aaronjj
November 25, 2007, 4:45pm
3
Each subdomain should have it’s own robots.txt file in it’s own root directory. To block an entire subdomain you would use
User-Agent: *
Disallow: /
No htaccess involved. Just a plain-text file named robots.txt
Thanks!
I wonder why I was away for so long…you Sitepoint guys are so helpful
akritic
November 25, 2007, 7:31pm
5
Welcome to Sitepoint Datalife Thanks for posting this thread, I learned something from it too. I should probably do the same for my multiple subdomains on my sites - though I’ve never bothered to create a robots.txt file as the instructions on google didn’t appear very clear to me - perhaps a second look is in order.
See my post above, and check out the link. It is about the clearest set of instructions on robots.txt files that I have found.
aaronjj
November 25, 2007, 8:25pm
7
Welcome to Sitepoint Datalife Thanks for posting this thread, I learned something from it too. I should probably do the same for my multiple subdomains on my sites - though I’ve never bothered to create a robots.txt file as the instructions on google didn’t appear very clear to me - perhaps a second look is in order.
IMO there’s no need for a robots.txt if you don’t want to block anything. A lot of people throw up empty ones so they don’t get a bunch of 404s in their logs.
aaronjj
November 25, 2007, 8:29pm
8
Your post is inaccurate. You can’t block access crawling of a subdomain by putting instructions in a robots.txt file in the top domain’s root.
akritic
November 25, 2007, 9:06pm
9
Ah! Missed that on first glance… thanks, it should take some of the stress out of creating one -at first glance it looks very straightforward.