SitePoint Sponsor

User Tag List

Results 1 to 12 of 12
  1. #1
    Non-Member
    Join Date
    Jan 2006
    Posts
    26
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    How to stop Bots crawling content of section of the page ?

    I don't want Search engine bots to crawled the content of section of the page. How can I do that ?

    I can stop crawlers(like Googlebot) crawling pages/directories using robots.txt but how to stop them from crawling only a part of the page ?

    for example, I don't want googlebot to crawl the header and the footer of my site. Is there anyway to achieve this ? If not in PHP, than can I do it with javascript ?

    Cheers

  2. #2
    SitePoint Wizard stereofrog's Avatar
    Join Date
    Apr 2004
    Location
    germany
    Posts
    4,324
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Yes, generate those sections dynamically with javascript.

  3. #3
    Non-Member
    Join Date
    Jan 2006
    Posts
    26
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by stereofrog View Post
    Yes, generate those sections dynamically with javascript.
    Hmmm. actually that was supposed to be the last option. Anything else possible ?

  4. #4
    SitePoint Wizard stereofrog's Avatar
    Join Date
    Apr 2004
    Location
    germany
    Posts
    4,324
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    You can also cloak the parts of page with php, checking HTTP_USER_AGENT request field.

  5. #5
    Non-Member
    Join Date
    Jan 2006
    Posts
    26
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by stereofrog View Post
    You can also cloak the parts of page with php, checking HTTP_USER_AGENT request field.
    SF, can u tell me more about it as I am not able to get you.

    Regards

  6. #6
    SitePoint Wizard stereofrog's Avatar
    Join Date
    Apr 2004
    Location
    germany
    Posts
    4,324
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    example:

    PHP Code:
    if(!preg_match("~google|yahoo|msnbot~"$_SERVER['HTTP_USER_AGENT']))
        echo 
    "robots won't see this"

  7. #7
    SitePoint Wizard lorenw's Avatar
    Join Date
    Feb 2005
    Location
    was rainy Oregon now sunny Florida
    Posts
    1,098
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    @steriofrog

    Very devious, I like it! I use it in a hitlogger script to seperate the bots but will there be a problem with google thinking it is a form of (reverse) cloaking? I have heard that google sends out two bots, one looks like a random legit user and then google compares the page results. You may get dinged for using this.

    just a thought.

  8. #8
    SitePoint Addict
    Join Date
    Apr 2005
    Posts
    287
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by lorenw View Post
    @steriofrog

    Very devious, I like it! I use it in a hitlogger script to seperate the bots but will there be a problem with google thinking it is a form of (reverse) cloaking? I have heard that google sends out two bots, one looks like a random legit user and then google compares the page results. You may get dinged for using this.

    just a thought.
    New York times uses some cloaking techniques. Alowing Google/Yahoo/Search Engines to crawl their content but users have to register/pay to see their content.
    How does that make your feel?

  9. #9
    SitePoint Wizard lorenw's Avatar
    Join Date
    Feb 2005
    Location
    was rainy Oregon now sunny Florida
    Posts
    1,098
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    I have also seen this in google results when googling php stuff, sometimes the cached page works and sometimes not, hmm makes me wonder if I spoof my user agent in ff if I could freely browse their site. Its late here but maybe tomorrow?? this is even getting more devious lol.
    What I lack in acuracy I make up for in misteaks

  10. #10
    SitePoint Addict
    Join Date
    Apr 2005
    Posts
    287
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    It works on some sites but websites are getting smarter... they now detect the ip address of the bot rather than the user agent.
    How does that make your feel?

  11. #11
    Follow Me On Twitter: @djg gold trophysilver trophybronze trophy Dan Grossman's Avatar
    Join Date
    Aug 2000
    Location
    Philadephia, PA
    Posts
    20,580
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Serving different content to the spider than your regular visitors is a violation of Google's webmaster guidelines. Weigh the consequences when you consider cloaking.

  12. #12
    SitePoint Wizard cranial-bore's Avatar
    Join Date
    Jan 2002
    Location
    Australia
    Posts
    2,634
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    yeah, potentially you could be dropped by the Google index all together, which would have the same effect as using robots.txt in the first place....


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •