When limiting user access to some pages by using php and mySQL (as you can to in Forums like vBulletin oder in CMSs like PHPNuke), search engines will allways be excluded.
Is there some way to allow special domainnames to access the database?
In my case I'm having a CMS. To get access to the pdfs and tutorials on the server you need to log in. Now I still would like to have google index those PDFs, so that will find those documents at google, but needs to register to download them?!
If log in is required then that means that the googlebot / spider cant access those files following links.
I dont know what can happend if you put the link but then dont give access... maybe it will index the link but with a access denied text... and thats bad.
OK, is there maybe a way to grant access for special IPs when using .htaccess, so that I can put a hidden link to the directory of the files, that is only accesseble for special IPs?
try checking the user agent of each visitor. if it contains "googlebot", don't require log-in, otherwise, show them the login box. (This may leave your protected content vulnerable to user agent spoofing, though. I'm not sure if you can spoof as googlebot...)
of course you can, it's quite easy to append a string (for example "googlebot") to your user agent's, and selecting users on their user agent strings seems because of this VERY dangerous.
I am myself doing this sometimes when checking opponents' sites potential doorway pages.
The IP thing may be better but still IP spoofing may happen.
Anyway i don't really see a point in having protected pages indexed, since the users eventually seeing the indexed pages in google won't be able to access them, and since protected data is generally user-specific.
I'm aware of the securety problem, but since everybody can register, everybody will be able to download. Thus if someone (a smart programmer oder hacker) wants to write a work around and download this stuff that way... OK, if there isn't anything better to use time for
The only reason we want the users to log in, is to identify our users, get in contact, allow them to upload and be able to know who uploaded what, an reminding those allways just downloadin, also to upload a few things.
Letting google crawl a site already allows unregistered users to read the content of the protected area, since google translates PDFs to txt - but its not that nice formatted
Bookmarks