Test site being indexed

Hi,

I am in the process of building a site for a client which has been going on for months but hasn’t launched yet. As it was a new domain and there were no links to it (or any other reference to it anywhere) I didn’t bother password-protecting the site. But then I recently noticed it had been indexed by Google (and others I think) so I added a ‘noindex’ tag and a robots.txt file.

However, I also have an admin area which is password-protected but within it is a database update script which isn’t password-protected as it’ll need to be run via cronjobs or similar. I didn’t really think this would be a problem as there are no links to it anywhere so no-one would ever know that it existed, or so I thought.

I get emailed the results of the script and as I’m still testing and debugging, it first dumps the SERVER vars. I got an email this morning containing the following:

 [HTTP_FROM] => crawler@alexa.com
 ...
 [HTTP_USER_AGENT] => ia_archiver (+http://www.alexa.com/site/help/webmasters; crawler@alexa.com)

So, two questions:

  1. How did Google, etc. find the site in the first place?
  2. How did Alexa find my update script (especially given that there are no links to it)?

Thanks,

Martin.

Well I got to the bottom of it and it’s the Alexa bit of the Searchstatus Firefox extension. That apparently tells Alexa which pages I’m viewing, and like a gossipy old woman, Alexa tells Google.