I’m thinking I’ll just password protect the root domain and that would prevent google from crawling. But is that the best way? And would that have any negative impact on future crawls or seo? Thnaks!
Try nofollow the site in robots.txt
And meta tag no-index, no-follow.
I am not sure of the correct syntax so search for examples.
As John says, robots.txt is the easiest way to do this:
User-agent: *
Disallow: /
That’s all you need to exclude (reputable) bots from the entire site. If your only concern is to prevent indexing before the site is ready, that will do the trick. As search engine bots look for a robots.txt file every time they visit, then once you’ve completed testing, you can delete those lines and your site will be crawled and indexed as normal.
See also http://www.robotstxt.org/robotstxt.html for more details.
Ya thats a good one too! But they are not obligated to follow the rule. I know google does. But I don’t like the not obligated part.
All the reputable ones follow the rules. If your only concern is to ensure it doesn’t end up on Google, Yahoo!, Bing or any of the major search engines before it’s ready, then robots.txt is the easiest way to go. Yes, it might be indexed by a couple of small search engines which almost nobody uses, but does that matter?
I’ve never found any problem with this approach.
Robots.txt works – all of our QA sites are publicly exposed for various reasons, the proxy we have sitting in front of them magically replaces robots.txt with our special restricted version and they don’t appear on google anymore.
Using password protection across the board also works but that falls down between user management and some techincial angles – like accessing web services.