SitePoint Sponsor

User Tag List

Results 1 to 6 of 6
  1. #1
    SitePoint Addict
    Join Date
    Feb 2003
    Location
    Berlin
    Posts
    370
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Let google crawl password restricted pages

    When limiting user access to some pages by using php and mySQL (as you can to in Forums like vBulletin oder in CMSs like PHPNuke), search engines will allways be excluded.
    Is there some way to allow special domainnames to access the database?

    In my case I'm having a CMS. To get access to the pdfs and tutorials on the server you need to log in. Now I still would like to have google index those PDFs, so that will find those documents at google, but needs to register to download them?!

    Any hint?

    Thanks
    Fl÷zen

  2. #2
    SitePoint Addict Ramiro S's Avatar
    Join Date
    May 2003
    Posts
    321
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    If log in is required then that means that the googlebot / spider cant access those files following links.

    I dont know what can happend if you put the link but then dont give access... maybe it will index the link but with a access denied text... and thats bad.
    Quasar - Web Development - Free Avatars

  3. #3
    SitePoint Addict
    Join Date
    Feb 2003
    Location
    Berlin
    Posts
    370
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    OK, is there maybe a way to grant access for special IPs when using .htaccess, so that I can put a hidden link to the directory of the files, that is only accesseble for special IPs?

  4. #4
    SitePoint Wizard davidjmedlock's Avatar
    Join Date
    Dec 2002
    Location
    Nashville, TN USA
    Posts
    1,688
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    try checking the user agent of each visitor. if it contains "googlebot", don't require log-in, otherwise, show them the login box. (This may leave your protected content vulnerable to user agent spoofing, though. I'm not sure if you can spoof as googlebot...)

  5. #5
    SitePoint Guru quenting's Avatar
    Join Date
    Dec 2002
    Location
    Switzerland
    Posts
    735
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    of course you can, it's quite easy to append a string (for example "googlebot") to your user agent's, and selecting users on their user agent strings seems because of this VERY dangerous.

    I am myself doing this sometimes when checking opponents' sites potential doorway pages.


    The IP thing may be better but still IP spoofing may happen.

    Anyway i don't really see a point in having protected pages indexed, since the users eventually seeing the indexed pages in google won't be able to access them, and since protected data is generally user-specific.

    Quentin

  6. #6
    SitePoint Addict
    Join Date
    Feb 2003
    Location
    Berlin
    Posts
    370
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I'm aware of the securety problem, but since everybody can register, everybody will be able to download. Thus if someone (a smart programmer oder hacker) wants to write a work around and download this stuff that way... OK, if there isn't anything better to use time for

    The only reason we want the users to log in, is to identify our users, get in contact, allow them to upload and be able to know who uploaded what, an reminding those allways just downloadin, also to upload a few things.

    Letting google crawl a site already allows unregistered users to read the content of the protected area, since google translates PDFs to txt - but its not that nice formatted

    Fl÷zen


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •