SitePoint Sponsor

User Tag List

Results 1 to 5 of 5

Thread: About Robot.txt

  1. #1
    SitePoint Addict certify's Avatar
    Join Date
    May 2001
    Location
    Malaysia
    Posts
    216
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    About Robot.txt

    I don't really understand robot.txt

    What does it actually do and will it help me gain better rankings in search engine?
    Your WebHosting Partner - RouterWebHost.com
    eXam SIMulation Software - ExamSim.net
    geeks forums -ProgrammingTutor.com
    BMWClub of Malaysia - BMWClub.com.my
    unofficial MU Fan Club coming soon - ManchesterUnited.com.my


  2. #2
    We are vigilant icehousedesigns's Avatar
    Join Date
    Dec 2000
    Location
    Io
    Posts
    299
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

  3. #3
    One website at a time mmj's Avatar
    Join Date
    Feb 2001
    Location
    Melbourne Australia
    Posts
    6,282
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Well, actually, mine did

    I found that each time google indexed my site, it would find maybe 100 or so pages, but they'd all be meaningless boring pages from my forums, etc, as they were created dynamically. They'd all be nearly identical.

    What I did, was I carefully configured robots.txt so that only the pages with actual content were allowed to be indexed. Thus when google visited my site and spidered another one or two hundred pages, it would know what not to get.

    All my pages are of course, dynamic, but I can exclude them based on what base url or filename they are, so it was only a few lines in robots.txt that I needed to add.

    It seems to have worked.

    This is also more effective than a META robots tag in the actual file, because the spider still downloads the file first and it counts as a hit.

    So using robots.txt reduces the number of unneeded hits by spiders, allowing them to get more of the relevant pages eah visit.

    This works on the assumption that the spider gives up spidering your site when a certain number of new pages are found, to prevent endless spidering of dynamic content. I think this is how google works.
    [mmj] My magic jigsaw
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    The Bit Depth Blog Twitter Contact me
    Neon Javascript Framework Jokes Android stuff

  4. #4
    We are vigilant icehousedesigns's Avatar
    Join Date
    Dec 2000
    Location
    Io
    Posts
    299
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Good point mmj. Some spiders *will* give up after so many crawls. They may not snag all of your pages. I haven't had that problem yet but I'm ready

  5. #5
    One website at a time mmj's Avatar
    Join Date
    Feb 2001
    Location
    Melbourne Australia
    Posts
    6,282
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Yes. Of course, mine was a special case.

    I agree that normally, the purpose of robots.txt is not to help search engine performance
    [mmj] My magic jigsaw
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    The Bit Depth Blog Twitter Contact me
    Neon Javascript Framework Jokes Android stuff


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •