How to Prevent Search Bots from Crawling Subdomains When Multiple server_name

Basically, I am declaring two server_name urls (www and another subdomain) under the same server{}.

I did this because there are so many rewrites that I don’t want to have to copy/paste them all in creation of another server{} listener. Let alone edit rewrites twice if I have to.

But the catch is that I can’t specify another root folder for second server_name, so can’t place a robots.txt file for that subdomain. Again, this subdomain is exactly same content is primary domain.

I need to find a way to ensure the search engines can’t browse it and think they discovered duplicate content (and such).

All feedback appreciated.
Ryan

Not sure I fully understand the problem, but:

If you can’t use robots.txt to prevent crawling/indexing, you could try adding a robots meta tag to the pages you don’t want crawled.

Well, by using the multiple server_name, they are the same exact pages loaded from the same location. So if I updated a file with that robots meta, it will appear on the primary site (which I want to continue indexing) and the origin.

So,

server {
listen 80;
server_name www.mydomain.com cache.mydomain.com;
#tons of rewrite and page rules follow
}

So the nginx server is supplying the exact same files from the same folder with the same rewrites, so exact clone of live site. If I were to modify a file directly, like the home page, it would appear modified on both primary site and “cache” subdomain.

Cheers!
Ryan

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.