So, I am part of a forum that uses the IP.Board software. The entire forum is password protected, in other words, you cannot enter without a login.
I’ve noticed when I’m in the forum that I see that Google and Bing are both viewing certain threads within the forum even though the entire forum is protected by username/password authentication.
Another thing that could be happening is the bots are going directly to that page, but can’t see content. In other words, the views is there, but the bot won’t see what logged in users see. You can try to verify this by searching those pages on those 2 engines and see what comes up. If it’s just an index of what non-logged in users see, then I wouldn’t even worry. If they see contents that are supposed to be password protected, I’d recommend using robots.txt and other various methods like the meta tag noindex.
How can they even know what the threads are or the titles of them? However, when viewing the “Who’s Online” list of users, Bing is viewing a thread. How can they even view the thread to begin with, they have no username/password login? There is no list of threads until after you login to the forum.
It is possible for the server to check the user agent or IP of a visitor (though these things can be spoofed).
For example, if you know a certain IP belongs to BingBot, you could grant it access without login, if you wanted the content indexing.
To be clear, are you asking out of curiosity, or is this a problem to be solved?
Curiosity, first. But if it’s a problem, I will need to solve it.
I guess my question is this: how can a spider (Bing, Google) view a thread without login credentials? If you do not have login credentials, you cannot enter the forum to view the threads.
Google and other search engines index anything they can find. If your website doesn’t use the robots.txt file and meta tag noindex, I’m pretty sure if the bot lands on a page that doesn’t give a 404 header, it’ll index that page regardless if the content is relevant or not. I’m not a crawl bot expert, but that’s what I am assuming is going on. I personally slap noindex and robots.txt on all my domains and every time I search for my website on Google, it never appears or has anything related to my website. I suppose you could also do that too if you aren’t too concerned about SEO related topics.