One type does not fit all…I realize that (there are many variations)- but how can you rest assured the current robots.txt you use, is the right one? Can I list mine here to show? Curious . . . .
Sure! Go ahead.
for this URL, I have it as:
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Well, I just learned two new things.
One, I wondered if “Allow” had become a real thing yet - it has.
Second, Google has a robots.txt tester
It doesn’t test for if the file is doing what it “should do” but it does test for
syntax warnings and logic errors
…
if the URL you entered is blocked from Google web crawlers.
In other words, is it doing what you think you’re telling it to do.
Of course keep in mind that robots.txt is “suggestion” not “rules” and some search bots may ignore it or even use it to find paths.
I’m not sure why you Allow admin-ajax.php, it resulted in “0” when I tried it.
And I think you should consider adding a Disallow for the wp-includes and wp-contents folders, though probably not that big of a deal anyway as there is nothing in those folders for the bots to see unless the PHP engine goes down.
The possible exception being where all your images are kept, you should probably Allow that. And even submit an images sitemap for them though it looks like don’t even have any sitemap.xml file yet.
I realise this is not an answer to your question but if you are looking to secure your wp-admin area, a more secure way is as follows:
If you are the only person who needs to log into your admin area and if you have a static IP address, you can deny access to the wp-admin folder to everyone but yourself via an .htaccess file.
# Block access to wp-admin.
order deny,allow
allow from x.x.x.x
deny from all
Where x.x.x.x is your IP address.
gandalf, I do want to allow my designer occasional access from time to time if a problem occurs with any of the specialized PHP scripting he configured within this site. So although I am the primary user of the site and blog post, interlink specific pages and so forth, I won’t deny access to the admin area. As for the “Allow admin-ajax.php”, my designer added that. He has been working on other projects and thought I would initially quiz about the robots.txt here, after reading into it a bit and I now am curious about something…
Since my site is “image-based” or geared to images I offer for licensing, what is the sitemap.xml file and where can I access that in my wp-admin. Might I share that for additional feedback on here? Keep in mind, I provide over 5,000 separate archived images throughout the site which can be accessed off this link (see various categories on right side of page - over 60 separate)
Use this
User-agent: *
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /wp-content/
Disallow: /wp-includes/
Disallow: /recommended/
Disallow: /comments/feed/
Disallow: /trackback/
Disallow: /index.php
Disallow: /xmlrpc.php
Disallow: /wp-content/plugins/
User-agent: NinjaBot
Allow: /
User-agent: Mediapartners-Google*
Allow: /
User-agent: Googlebot-Image
Allow: /wp-content/uploads/
User-agent: Adsbot-Google
Allow: /
User-agent: Googlebot-Mobile
Allow: /
Why?
Please take the time to explain why you are recommending something, and why it is more appropriate to @danno2’s needs than what he is currently using. Otherwise, your post is of no real help to anybody.
I’m not aware that Allow is a valid directive in robots.txt
Where did you get this from?
Google seems to suggest it is…
Strange, there seems no mention of it on www.robotstxt.org (unless I’ve just missed it!)
In fact that site goes against what Google says and has
This is currently a bit awkward, as there is no “Allow” field. The easy way is to put all files to be disallowed into a separate directory, say “stuff”, and leave the one file in the level above this directory:
I think it’s the other way around.
As I understand it, “Allow” was never part of the “official” robots protocol, but Googlre introduced support for it. I think (but I’m not certain) that other large search engines also now support it, but I would prefer not to rely upon it, as it is still “non-standard”, as far as I know.
Strange. I wonder if other search engines use Allow…
As I understand it, robots will assume to be allowed, unless told otherwise. So I think it’s kind of redundant, unless you maybe disallow all, then just allow a select few.
This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.