So, I am part of a forum that uses the IP.Board software. The entire forum is password protected, in other words, you cannot enter without a login.
I’ve noticed when I’m in the forum that I see that Google and Bing are both viewing certain threads within the forum even though the entire forum is protected by username/password authentication.
How can this be?
Perhaps the site whitelists certain bots and grants them access.
Hmm. I was thinking that. So you just change your browser’s user-agent string to Googlebot and your in.
Another thing that could be happening is the bots are going directly to that page, but can’t see content. In other words, the views is there, but the bot won’t see what logged in users see. You can try to verify this by searching those pages on those 2 engines and see what comes up. If it’s just an index of what non-logged in users see, then I wouldn’t even worry. If they see contents that are supposed to be password protected, I’d recommend using
robots.txt and other various methods like the meta tag
How can they even know what the threads are or the titles of them? However, when viewing the “Who’s Online” list of users, Bing is viewing a thread. How can they even view the thread to begin with, they have no username/password login? There is no list of threads until after you login to the forum.
It is possible for the server to check the user agent or IP of a visitor (though these things can be spoofed).
For example, if you know a certain IP belongs to BingBot, you could grant it access without login, if you wanted the content indexing.
To be clear, are you asking out of curiosity, or is this a problem to be solved?
Curiosity, first. But if it’s a problem, I will need to solve it.
I guess my question is this: how can a spider (Bing, Google) view a thread without login credentials? If you do not have login credentials, you cannot enter the forum to view the threads.
It’s quite easy for the login script to make exceptions for certain user agents and/or certain IP addresses to bypass the login.
Is it possible that somebody has linked to a thread from another site, and the bots have followed that link?
Try doing a site:yourdomain search to see what - if anything - has been indexed.
Google and other search engines index anything they can find. If your website doesn’t use the
robots.txt file and meta tag
noindex, I’m pretty sure if the bot lands on a page that doesn’t give a
404 header, it’ll index that page regardless if the content is relevant or not. I’m not a crawl bot expert, but that’s what I am assuming is going on. I personally slap
robots.txt on all my domains and every time I search for my website on
Google, it never appears or has anything related to my website. I suppose you could also do that too if you aren’t too concerned about
SEO related topics.
This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.