SitePoint Sponsor

User Tag List

Results 1 to 9 of 9
  1. #1
    Non-Member
    Join Date
    Jun 2013
    Posts
    4
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    robots.txt should not be used to ban large portions of your site...

    robots.txt must not be used to ban large portions of your site. Even if you ban significant portions of your site, search engine spiders may mark your site as "forbidden" in general and simply stop spidering your site as often.

  2. #2
    SitePoint Guru Webinsane's Avatar
    Join Date
    Oct 2005
    Location
    Montenegro
    Posts
    897
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    This is the first time that I encounter such theory. In general search engines push towards cleaner internet. They want you to organize your website so they can index less and provide more. It is all about resource preservation. So I doubt that this theory has real facts behind it.
    CUBE SCRIPTS MEDIA
    REAL ESTATE SCRIPT 1.4 | Software for Real Estate Agencies
    INSTANT UPDATE CMS 4.2 | NO template system! NEW VERSION!

  3. #3
    Mouse catcher silver trophy
    Stevie D's Avatar
    Join Date
    Mar 2006
    Location
    Yorkshire, UK
    Posts
    5,822
    Mentioned
    110 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by Webinsane View Post
    This is the first time that I encounter such theory. In general search engines push towards cleaner internet. They want you to organize your website so they can index less and provide more. It is all about resource preservation. So I doubt that this theory has real facts behind it.
    I agree with Jack. Let Google have free rein to crawl all over your site and index as much or as little of your site as it wants to. Yes, by all means use robots.txt and rel=nofollow to block off areas of the site that won't make sense to search engines or as landing pages, but apart from that let Google decide how it wants to deal with your site and you are likely to better than if you try to dictate or second-guess too much.

  4. #4
    Barefoot on the Moon! silver trophy
    Force Flow's Avatar
    Join Date
    Jul 2003
    Location
    Northeastern USA
    Posts
    4,516
    Mentioned
    51 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by Jack John View Post
    robots.txt must not be used to ban large portions of your site. Even if you ban significant portions of your site, search engine spiders may mark your site as "forbidden" in general and simply stop spidering your site as often.
    This is a misconception. Search engines will continue to crawl you site--just not the pages/paths you disallow.
    Visit The Blog | Follow On Twitter
    301tool 1.1.5 - URL redirector & shortener (PHP/MySQL)
    Can be hosted on and utilize your own domain

  5. #5
    Mouse catcher silver trophy
    Stevie D's Avatar
    Join Date
    Mar 2006
    Location
    Yorkshire, UK
    Posts
    5,822
    Mentioned
    110 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by Force Flow View Post
    This is a misconception. Search engines will continue to crawl you site--just not the pages/paths you disallow.
    The problem is that you might inadvertently close off a route they were using to get to pages that you want them to crawl. The more restrictions you put into place as to where they can and can't go, the bigger the risk that they won't be able to quickly and effectively crawl the areas of the site that you want them to.

  6. #6
    SitePoint Member
    Join Date
    Jan 2013
    Location
    Ahmedabad, India
    Posts
    9
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Its up to you whether you want to disallow the search engine to crawl your major part of website by using robots.txt or not. If you are using robots.txt to disallow large number of pages/directories, it will slow down the crawler. You can use the Meta tags to block the Google Crawler from indexing your website too.

  7. #7
    SitePoint Guru Webinsane's Avatar
    Join Date
    Oct 2005
    Location
    Montenegro
    Posts
    897
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Stevie D View Post
    I agree with Jack. Let Google have free rein to crawl all over your site and index as much or as little of your site as it wants to.

    I would have to disagree. Lets assume you run complex software where you want to disable various sections that have no particular weight to any content. This way you save your resources as well.
    For example this is Sitepoint robots.txt


    User-agent: *
    Disallow: /search
    Disallow: /member
    Disallow: /private
    Disallow: /sendmessage
    Disallow: /report
    Disallow: /postings
    Disallow: /editpost
    Disallow: /newreply
    Disallow: /showpost
    Disallow: /online

    User-agent: BoardTracker
    Disallow: /
    CUBE SCRIPTS MEDIA
    REAL ESTATE SCRIPT 1.4 | Software for Real Estate Agencies
    INSTANT UPDATE CMS 4.2 | NO template system! NEW VERSION!

  8. #8
    Mouse catcher silver trophy
    Stevie D's Avatar
    Join Date
    Mar 2006
    Location
    Yorkshire, UK
    Posts
    5,822
    Mentioned
    110 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by Webinsane View Post
    I would have to disagree. Lets assume you run complex software where you want to disable various sections that have no particular weight to any content. This way you save your resources as well.
    That's what I meant by "by all means use robots.txt and rel=nofollow to block off areas of the site that won't make sense to search engines" ... areas of the site that you can only use when logged in don't make sense to search engines so they don't need access to it.

  9. #9
    SitePoint Enthusiast
    Join Date
    Jan 2013
    Location
    USA
    Posts
    87
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Well my experience is different. I have blocked 75% links of my website i.e I have four languages and out of these my site is only ready for english. I just cant allow Google to crawl my other 3 language pages as if it does it does it drop rank of my english version site as I am using Google translator for most my pages.

    Google still needs to improve and above all its bot so we cant trust it and have to make sure it should not see anything which it does not understand.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •