If I’m developing a site on a live server (and using the full domain), what is the best way to prevent access from everyone, including robots, until the site is ready?
I know about adding to the robots.txt:
User-agent: *
Disallow: /
and I know about adding to the .htaccess:
DirectoryIndex index.htm
(where the index file is a simple splash page ie. under construction)
You could always password protect it with .htaccess. I generally don’t worry too much about it. If it’s not indexed, and you’re not linking to it from an indexed page, no one will find it anyway.
I wouldn’t count on robots.txt to do it. Some bots might use it to go there.
True, for those bots that bother to read it and are “honest” it would help, but for others it’s tempting bait. I have a few “honeypots” in my robots.txt and they get followed quite regularly despite the disallow.
I guess it depends on how sensitive the files are, but password protection would probably be good enough. Maybe you could also do a rewrite for all Remote Addresses except yours - if you will always be using the same IP(s).
If they’re extremely sensitive you could put them outside of the root and access them via script only, but that would be more involved and I imagine would be overkill seeing as you plan to eventually host them in an accessible area anyway.
The htaccess password protection is the quickest and easiest way, otherwise you can load up httpd.conf and only allow YOUR IP address to access your specific area.