SitePoint Sponsor

User Tag List

Results 1 to 12 of 12

Thread: robots.txt

  1. #1
    Destiny Manager Plebius's Avatar
    Join Date
    Nov 1999
    Posts
    682
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    See http://www.google.com/bot.html#norobots
    and http://info.webcrawler.com/mak/proje.../norobots.html



    ------------------
    Martin Kretzmann
    Plebius Press - A progressive perspective ... "Insert favorite quote here"
    We have hosted and perl scripts too!

  2. #2
    SitePoint Enthusiast sawz's Avatar
    Join Date
    Aug 1999
    Posts
    76
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    i recently started using 404 error software and i am getting notified that someone or something is calling this document (robots.txt) and it dosen't exist.

    can someone tell me what this is and can i use it to my advantage?

  3. #3
    SitePoint Zealot freejavahelp's Avatar
    Join Date
    Jul 2000
    Posts
    176
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    sawz

    I am getting the same exact problem. There have been requests for robots.txt on my site! But i don't have robots.txt

    Maybe it is something to do with the search engine spiders and bots

    let me know if you find anything out!

    Thanks

    Jim
    http://www.freejavahelp.com
    Making Java REALLY Easy
    Tutorials, Forums, and Articles

  4. #4
    SitePoint Zealot Jason_Therrien's Avatar
    Join Date
    Jul 2000
    Location
    Sunny Cleveland
    Posts
    167
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Robots.txt is a way of letting a webmaster know that a "spider" has been to your site. This means that a search engine has sent out "feelers" to your site and collected data off of it.

    This is a good thing if you want search engines indexing your site! You may also notice different robots from different sites. This is also good if you want to be included on different sites.

    Jason
    www.SmartWebBusiness.com
    Where "smart" businesses learn about the Web.

  5. #5
    Your Lord and Master, Foamy gold trophy Hierophant's Avatar
    Join Date
    Aug 1999
    Location
    Lancaster, Ca. USA
    Posts
    12,305
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Originally posted by Jason_Therrien
    Robots.txt is a way of letting a webmaster know that a "spider" has been to your site. This means that a search engine has sent out "feelers" to your site and collected data off of it.

    This is a good thing if you want search engines indexing your site! You may also notice different robots from different sites. This is also good if you want to be included on different sites.

    Actually the Robots.txt file is to tell the spiders how to index your site. You can block off certain files or make the robots follow a certain path through your site using parameters in this file.
    Wayne Luke
    ------------


  6. #6
    ********* Addict
    Join Date
    Feb 2000
    Location
    NE FL, USA
    Posts
    301
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    How Wayne?
    Brain Bucket Magazine - Biker News, Views, and Event Coverage.

  7. #7
    Your Lord and Master, Foamy gold trophy Hierophant's Avatar
    Join Date
    Aug 1999
    Location
    Lancaster, Ca. USA
    Posts
    12,305
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    The best way to find out would be to review these two documents.
    Wayne Luke
    ------------


  8. #8
    SitePoint Enthusiast sawz's Avatar
    Join Date
    Aug 1999
    Posts
    76
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I'm glad my post brought some educated responses, I am still confused just a little in regards to this robot.txt.
    At the present I am sending all robots away only cause I read somewhere that they enter your site and suck up the bandwidth, perhaps I misunderstood. plebius.org has an interesting file for that but site-point has none.

    So should I do the robots.txt or not, perhaps there is an idiots quide to this cause what I have read so far has confused me.

    Thanks for your participation on this topic.


  9. #9
    Your Lord and Master, Foamy gold trophy Hierophant's Avatar
    Join Date
    Aug 1999
    Location
    Lancaster, Ca. USA
    Posts
    12,305
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    If you don't allow the robots to index your site, you won't be listed in any search engines. They wouldn't suck up any more bandwidth than a normal user.
    Wayne Luke
    ------------


  10. #10
    SitePoint Enthusiast sawz's Avatar
    Join Date
    Aug 1999
    Posts
    76
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    well in that case i just removed the file, search engines are our friend.

    is not having a robots.txt ok or is there an advantage to it?

  11. #11
    Your Lord and Master, Foamy gold trophy Hierophant's Avatar
    Join Date
    Aug 1999
    Location
    Lancaster, Ca. USA
    Posts
    12,305
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    If properly done you can use a series of robots.txt files to guide the spiders in only indexing the pages and subsites you want within your site. This allows you gain an upperhand on figuring out where the major entry points are, determining how different sections get index via keywords and preventing highly dynamic or temporary pages from being indexed leading to frustrating 404 errors on your site.
    Wayne Luke
    ------------


  12. #12
    SitePoint Zealot Website Rob's Avatar
    Join Date
    Aug 2000
    Location
    Alberta, Canada
    Posts
    113
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    To help get you started, here is an example of a robots.txt file:


    User-agent: *
    Disallow: /cgi-bin/
    Disallow: /logs/
    Disallow: /public_ftp/
    Disallow: /stats/
    Disallow: /clients/
    Disallow: /test/
    Disallow: /download/
    Disallow: /images/


    Copy & Paste the above (add or delete dir as you wish) then save as "robots.txt" and upload to your top directory. This is the directory with the index.html (or first page) for your site.

    This tells "all" robots what directories not to enter. Any directory not listed is OK for them to index, which would include any files within those directories. The robots.txt file also helps with some Site Grabber programs. Web Reaper and Site Snagger are two that come to mind but there are lots out there.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •