SitePoint Sponsor

User Tag List

Results 1 to 4 of 4
  1. #1
    SitePoint Addict gthorley's Avatar
    Join Date
    Oct 2000
    Location
    Canada
    Posts
    392
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The following was taken from my log file

    216.35.116.91 - - [09/Apr/2001:06:29:55 -0400] "GET /robots.txt HTTP/1.0" 404 204 "-" "Mozilla/3.0 (Slurp/si; slurp@inktomi.com; http://www.inktomi.com/slurp.html)"

    216.35.116.91 - - [09/Apr/2001:06:29:55 -0400] "GET / HTTP/1.0" 200 11406 "-" "Mozilla/3.0 (Slurp/si; slurp@inktomi.com; http://www.inktomi.com/slurp.html)"

    216.35.116.93 - - [09/Apr/2001:06:50:38 -0400] "GET /robots.txt HTTP/1.0" 404 204 "-" "Mozilla/3.0 (Slurp/si; slurp@inktomi.com; http://www.inktomi.com/slurp.html)"

    216.35.116.93 - - [09/Apr/2001:06:50:38 -0400] "GET / HTTP/1.0" 200 11406 "-" "Mozilla/3.0 (Slurp/si; slurp@inktomi.com; http://www.inktomi.com/slurp.html)"


    Can someone tell me exactly what occurred here I searched and I do not have a file robots.txt. Should I be creating one?

    I notice that inktomi appeared to try to index/spider whatever twice minutes apart. Why, did it fail to get anything the first time?

    Is this normal the way this reads? Should I be doing something to help Inktomi out?

  2. #2
    One website at a time mmj's Avatar
    Join Date
    Feb 2001
    Location
    Melbourne Australia
    Posts
    6,282
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    You do not need a robots.txt file.

    To stop search engines from accessing certain areas of your site, you can create a robots.txt file with instructions for the search engines on which directories not to index.

    That's why the spiders always check for a robots.txt file.

    It appears that the inktomi spider visited your site twice in a 21 minute period, and each time it successfully got your entire "/" uri (your home page).

    This doesn't say whether or not the spider liked what it saw - it just indicates that it saw something.
    [mmj] My magic jigsaw
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    The Bit Depth Blog Twitter Contact me
    Neon Javascript Framework Jokes Android stuff

  3. #3
    SitePoint Addict gthorley's Avatar
    Join Date
    Oct 2000
    Location
    Canada
    Posts
    392
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks MMJ I had forgotten about this post.

    quote -It appears that the inktomi spider visited your site twice in a 21 minute period, and each time it successfully got your entire "/" uri (your home page).

    "/" uri Can you explain I don't see that in the log?

    It would seem that if it just took the home page then it didn't index any other pages. Is there a trick to getting them to go deeper?

  4. #4
    One website at a time mmj's Avatar
    Join Date
    Feb 2001
    Location
    Melbourne Australia
    Posts
    6,282
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    The http command is the one that starts with GET. Right after get is the uri that is wanted. If the uri is just "/" then it means the root of your site.

    In my experience, the robots only get the root of my site on the first crawl. It is not for a couple more weeks that they use this information to actually revisit your site and on the next crawl, crawl deeper. I haven't had much experience, so I don't know if this is always the case.

    The search engine guide http://www.searchengineguide.org/ has some info on which spiders will crawl deeper into your site.

    To speed things up, you should submit 3 to 5 pages of your site to the engine, so that the first crawl can at least get some more pages than just one.

    At the moment I cannot remember which engines use inktomi. Netscape does, and msn. I think. It's not one of the top 5 though
    Last edited by mmj; Apr 18, 2001 at 10:20.
    [mmj] My magic jigsaw
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    The Bit Depth Blog Twitter Contact me
    Neon Javascript Framework Jokes Android stuff


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •