SitePoint Sponsor

User Tag List

Results 1 to 11 of 11
  1. #1
    SitePoint Member
    Join Date
    Jun 2005
    Posts
    4
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Question Giving a spider "logged in" access

    Howdy,

    I have built a site that has a huge amount of information on wines in South Africa. The problem is the user needs to register to view the content.

    How can I give the various spiders "logged in" status so that they can read this information.

    Is it better to redirect the spiders to an unprotected page and the users to the protected one?

    The site is programmed in ASP.

  2. #2
    SitePoint Addict
    Join Date
    Dec 2004
    Location
    Charlotte
    Posts
    247
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    the unprotected way would be better. I don't know how you would go about doing it the other way.

  3. #3
    I am obstructing justice. bronze trophy fatnewt's Avatar
    Join Date
    Jul 2002
    Location
    Ottawa, Canada
    Posts
    1,766
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    In theory, you could check the user agent, and allow login automatically in your authentication scripts to known spiders.

    You might get in trouble for it - showing something different to the user than you do to the spider can result in penalties or removal.

    How important is it that people register to view this? If you're considering letting spiders in, then you're greatly reducing the site's security - why not give read access to everyone?
    Colin Temple [twitter: @cailean]
    Web Analyst at Napkyn


  4. #4
    Application Developer shabbirbhimani's Avatar
    Join Date
    Apr 2004
    Location
    India
    Posts
    2,272
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I see www.experts-exchange.com always do the same. If you want to see the reply to the answer you need to login but google cache always shows the answer and I dont think they are doing bad with SE but I may be wrong also.

  5. #5
    Jeremy Maddock WealthStream's Avatar
    Join Date
    Dec 2004
    Location
    Victoria, Canada
    Posts
    2,422
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    My advice would be to make the first couple of paragraphs of each article readable by anyone, then make your visitors register to read the rest of the information. That way, spiders could crawl the first part of each article (which would be full of keywords), and the user would be given more motivation to register, in order to finish reading the article.
    -- Jeremy Maddock
    SEOMix.com - Search Engine Optimization Tips
    My Blog - Business, tech, and politics from a webmaster's perspective

  6. #6
    SitePoint Member
    Join Date
    Jun 2005
    Posts
    4
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The site is based on wine guide published here in South Africa. The lastest info needs to be paid for and previous years are free (but you still need to register). The free registration is there to collect emails, keep track of our user base, etc.

    If I can possibly help it I don't want to have to go through all the trouble of putting the spider code in all the pages and then still have to copy the site to a new directory. The pages will appear exactly the same for the spider as the user. The spider just won't have to login.

    Obviously not having this information available to the spiders means a large amount of info disappears.

    Check out the site at www.platteronline.com maybe that will help explain.

  7. #7
    doing my best to help c2uk's Avatar
    Join Date
    May 2005
    Location
    Cardiff
    Posts
    1,832
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by fatnewt
    In theory, you could check the user agent, and allow login automatically in your authentication scripts to known spiders.
    I wouldn't base it solely on the user agent. There are tools for Firefox which can change your user agent and then all of a sudden you as a user appear to the website as the Googlebot and you can see exactly what Google sees. Here's one:

    User Agent Switcher

    Easy to use, all you need is find the user agent you would need in the Internet and type it in.

  8. #8
    SitePoint Addict Antonbomb22's Avatar
    Join Date
    Apr 2004
    Location
    NJ
    Posts
    321
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    this will get u in trouble with SE's, this is considered cloaking and or "doorway pages". google says
    Avoid "doorway" pages created just for search engines, or other "cookie cutter" approaches such as affiliate programs with little or no original content.
    . the correct terminology i am unsure of but what it is defined as is not important. imploying such a tatic will get u in trouble if they find you out. SE's hate sites that show content different to them because they are SE's, the reason being it does not benefit the end user who searched a query and has found your page but turns out that page with the desired information isnt available unless your an SE. all i am saying is think twice before doing this.

  9. #9
    I am obstructing justice. bronze trophy fatnewt's Avatar
    Join Date
    Jul 2002
    Location
    Ottawa, Canada
    Posts
    1,766
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by c2uk
    I wouldn't base it solely on the user agent.
    True story. It would be a good idea to also include some feature sniffing abilities to pick up a real browser, and perhaps some filters to ensure the IP address is in Google's known ranges (if they have any anymore).

    That wouldn't stop someone from reading it all in Google's cache.

    But that said, I wouldn't do this anyways. I was just suggesting that it could be done.
    Colin Temple [twitter: @cailean]
    Web Analyst at Napkyn


  10. #10
    SitePoint Addict Antonbomb22's Avatar
    Join Date
    Apr 2004
    Location
    NJ
    Posts
    321
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by fatnewt
    True story. It would be a good idea to also include some feature sniffing abilities to pick up a real browser, and perhaps some filters to ensure the IP address is in Google's known ranges (if they have any anymore).

    That wouldn't stop someone from reading it all in Google's cache.

    But that said, I wouldn't do this anyways. I was just suggesting that it could be done.
    well if we're on the topic of theory then to add to yours you can set so the page is not cached but just indexed allowing others not able to read the content. please anyone reading this!! this is all theory and if caught it will get u unindexed or banned from SE's.

  11. #11
    SitePoint Wizard silver trophy someonewhois's Avatar
    Join Date
    Jan 2002
    Location
    Canada
    Posts
    6,364
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hypothetically you could do it by IP address, and then tell Google not to cache the pages, but to index them. That's the safest way.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •