How to Prevent Search Bots from Crawling Subdomains When Multiple server_name

casbboy · October 13, 2016, 7:50am

Basically, I am declaring two server_name urls (www and another subdomain) under the same server{}.

I did this because there are so many rewrites that I don’t want to have to copy/paste them all in creation of another server{} listener. Let alone edit rewrites twice if I have to.

But the catch is that I can’t specify another root folder for second server_name, so can’t place a robots.txt file for that subdomain. Again, this subdomain is exactly same content is primary domain.

I need to find a way to ensure the search engines can’t browse it and think they discovered duplicate content (and such).

All feedback appreciated.
Ryan

SamA74 · October 13, 2016, 8:16am

Not sure I fully understand the problem, but:

If you can’t use robots.txt to prevent crawling/indexing, you could try adding a robots meta tag to the pages you don’t want crawled.

casbboy · October 13, 2016, 8:31am

Well, by using the multiple server_name, they are the same exact pages loaded from the same location. So if I updated a file with that robots meta, it will appear on the primary site (which I want to continue indexing) and the origin.

So,

server {
listen 80;
server_name www.mydomain.com cache.mydomain.com;
#tons of rewrite and page rules follow
}

So the nginx server is supplying the exact same files from the same folder with the same rewrites, so exact clone of live site. If I were to modify a file directly, like the home page, it would appear modified on both primary site and “cache” subdomain.

Cheers!
Ryan

system · January 12, 2017, 3:32pm

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.