SitePoint Sponsor |
|
User Tag List
Results 1 to 12 of 12
Thread: robots.txt
-
Aug 22, 2000, 22:06 #1
See http://www.google.com/bot.html#norobots
and http://info.webcrawler.com/mak/proje.../norobots.html
------------------
Martin Kretzmann
Plebius Press - A progressive perspective ... "Insert favorite quote here"
We have hosted and perl scripts too!
-
Aug 23, 2000, 03:30 #2
i recently started using 404 error software and i am getting notified that someone or something is calling this document (robots.txt) and it dosen't exist.
can someone tell me what this is and can i use it to my advantage?
-
Aug 28, 2000, 20:58 #3
sawz
I am getting the same exact problem. There have been requests for robots.txt on my site! But i don't have robots.txt
Maybe it is something to do with the search engine spiders and bots
let me know if you find anything out!
Thanks
Jim
-
Sep 5, 2000, 11:22 #4
- Join Date
- Jul 2000
- Location
- Sunny Cleveland
- Posts
- 167
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Robots.txt is a way of letting a webmaster know that a "spider" has been to your site. This means that a search engine has sent out "feelers" to your site and collected data off of it.
This is a good thing if you want search engines indexing your site! You may also notice different robots from different sites. This is also good if you want to be included on different sites.
-
Sep 5, 2000, 12:08 #5
- Join Date
- Aug 1999
- Location
- Lancaster, Ca. USA
- Posts
- 12,305
- Mentioned
- 1 Post(s)
- Tagged
- 0 Thread(s)
Originally posted by Jason_Therrien
Robots.txt is a way of letting a webmaster know that a "spider" has been to your site. This means that a search engine has sent out "feelers" to your site and collected data off of it.
This is a good thing if you want search engines indexing your site! You may also notice different robots from different sites. This is also good if you want to be included on different sites.
-
Sep 5, 2000, 15:13 #6
- Join Date
- Feb 2000
- Location
- NE FL, USA
- Posts
- 301
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
How Wayne?
Brain Bucket Magazine - Biker News, Views, and Event Coverage.
-
Sep 5, 2000, 15:33 #7
- Join Date
- Aug 1999
- Location
- Lancaster, Ca. USA
- Posts
- 12,305
- Mentioned
- 1 Post(s)
- Tagged
- 0 Thread(s)
Originally posted by LuZeR
See http://www.google.com/bot.html#norobots
and http://info.webcrawler.com/mak/proje.../norobots.html
-
Sep 10, 2000, 06:55 #8
I'm glad my post brought some educated responses, I am still confused just a little in regards to this robot.txt.
At the present I am sending all robots away only cause I read somewhere that they enter your site and suck up the bandwidth, perhaps I misunderstood. plebius.org has an interesting file for that but site-point has none.
So should I do the robots.txt or not, perhaps there is an idiots quide to this cause what I have read so far has confused me.
Thanks for your participation on this topic.
-
Sep 10, 2000, 09:11 #9
-
Sep 10, 2000, 11:28 #10
well in that case i just removed the file, search engines are our friend.
is not having a robots.txt ok or is there an advantage to it?
-
Sep 10, 2000, 16:50 #11
- Join Date
- Aug 1999
- Location
- Lancaster, Ca. USA
- Posts
- 12,305
- Mentioned
- 1 Post(s)
- Tagged
- 0 Thread(s)
If properly done you can use a series of robots.txt files to guide the spiders in only indexing the pages and subsites you want within your site. This allows you gain an upperhand on figuring out where the major entry points are, determining how different sections get index via keywords and preventing highly dynamic or temporary pages from being indexed leading to frustrating 404 errors on your site.
-
Sep 11, 2000, 00:41 #12
- Join Date
- Aug 2000
- Location
- Alberta, Canada
- Posts
- 113
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
To help get you started, here is an example of a robots.txt file:
User-agent: *
Disallow: /cgi-bin/
Disallow: /logs/
Disallow: /public_ftp/
Disallow: /stats/
Disallow: /clients/
Disallow: /test/
Disallow: /download/
Disallow: /images/
Copy & Paste the above (add or delete dir as you wish) then save as "robots.txt" and upload to your top directory. This is the directory with the index.html (or first page) for your site.
This tells "all" robots what directories not to enter. Any directory not listed is OK for them to index, which would include any files within those directories. The robots.txt file also helps with some Site Grabber programs. Web Reaper and Site Snagger are two that come to mind but there are lots out there.
Bookmarks