SitePoint Sponsor

User Tag List

Results 1 to 8 of 8

Thread: robots.txt

  1. #1
    SitePoint Enthusiast
    Join Date
    Mar 2006
    Posts
    66
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    robots.txt

    not showing up...

    I have written and uploaded a robots.txt file.

    However whenever I look at domain.com/robots.txt or in google webmaster tools, I see a couple different file to the one I wrote.

    I see:

    User-agent: *
    Disallow:

    But my actual robots file is very different.

    My host works like so; I have a folder for each domain, under each domain there is HTML and CGI-Bin

    I have tried it in the root folder and the HTML folder and the file I see at domain.com/robots.txt never changes.

    I am guessing I am doing something really silly, but I have no idea.

    Any help would be appreciated.

  2. #2
    Programming Team silver trophybronze trophy
    Mittineague's Avatar
    Join Date
    Jul 2005
    Location
    West Springfield, Massachusetts
    Posts
    17,036
    Mentioned
    187 Post(s)
    Tagged
    2 Thread(s)

    robots.txt

    Robots look for the robots.txt file in the "document root" of a domain. So if each subdomain is seen as a domian by a robot then it will need it's own robots.txt file.

  3. #3
    SitePoint Enthusiast
    Join Date
    Mar 2006
    Posts
    66
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I really dont understand.

    When I login to my host via FTP I see...

    backups
    domains
    data
    stats
    users
    logs
    etc etc etc

    I then click on "domains" and a list of all my sites are listed.

    when I click on the domain of one of my sites...

    I see

    cgi-bin
    html


    I have tried putting it in the same folder as this so...

    cgi-bin [folder]
    html [folder]
    robots.txt

    And I have tried putting it in the html folder.

    But every time, the robots.txt file you see by going to domain.com/robots.txt or the robots.txt that the webmaster tools show you is the same. And it is not the one I have uploaded.

    If it makes any difference, it is a wordpress theme and I have a .htaccess file with some rules in it.

  4. #4
    In memoriam gold trophysilver trophybronze trophy Dan Schulz's Avatar
    Join Date
    May 2006
    Location
    Aurora, Illinois
    Posts
    15,478
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    It should go in your html [ folder ] directory then.

  5. #5
    Programming Team silver trophybronze trophy
    Mittineague's Avatar
    Join Date
    Jul 2005
    Location
    West Springfield, Massachusetts
    Posts
    17,036
    Mentioned
    187 Post(s)
    Tagged
    2 Thread(s)

    robots.txt

    I have not tried the plugin myself, but maybe you could use the "KB Robots.txt" plugin (note the limitations) http://wordpress.org/extend/plugins/kb-robotstxt/

  6. #6
    SitePoint Enthusiast
    Join Date
    Mar 2006
    Posts
    66
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    That is what is so weird I have it in the HTML folder, and when I type in www.domain.com/robots.txt I get

    User-agent: *
    Disallow:

    When I should get:

    User-agent: *
    Disallow: /cgi-bin
    Disallow: /wp-admin
    Disallow: /wp-includes
    Disallow: /wp-content/plugins
    Disallow: /wp-content/cache
    Disallow: /wp-content/themes
    Disallow: /trackback
    Disallow: /comments
    Disallow: /category/*/*
    Disallow: */trackback
    Disallow: */comments
    Disallow: */tag
    Disallow: /*?*
    Disallow: /*?
    Disallow: /feed/
    Disallow: */feed
    Disallow: */starbucks/
    Disallow: /starbucks/
    Allow: /wp-content/uploads

    # Google Image
    User-agent: Googlebot-Image
    Disallow:
    Allow: /*

    # Google AdSense
    User-agent: Mediapartners-Google*
    Disallow:
    Allow: /*

    # Internet Archiver Wayback Machine
    User-agent: ia_archiver
    Disallow: /

    # digg mirror
    User-agent: duggmirror
    Disallow: /

    # Does anyone care I love Google Apache htaccess

    Sitemap: http://www.la.cityzine.com/sitemap.xml

  7. #7
    Programming Team silver trophybronze trophy
    Mittineague's Avatar
    Join Date
    Jul 2005
    Location
    West Springfield, Massachusetts
    Posts
    17,036
    Mentioned
    187 Post(s)
    Tagged
    2 Thread(s)

    robots.txt

    When you navigate to the robots.txt file using FTP (not by HTTP request) do you still see your version of the file? Might there be a plugin over-writing it? Do you have any htaccess mod-rewrite/mod-alias directives that might be in play?

  8. #8
    SitePoint Enthusiast
    Join Date
    Mar 2006
    Posts
    66
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Mittineague View Post
    When you navigate to the robots.txt file using FTP (not by HTTP request) do you still see your version of the file?
    Absolutely, that is why I am so confused, I login via FTP, read the text file, and it looks just like it should, with the full information. Via HTTP, then, back to the file with no information.

    Might there be a plugin over-writing it? Do you have any htaccess mod-rewrite/mod-alias directives that might be in play?

    I cannot think of anything that would do that.

    Thinking about, even when I delete the robots.txt file via FTP then it is still there via HTTP, and with the
    User-agent: *
    Disallow:


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •