Google webmaster is blocking links due to robot.txt in wordpress

I just completed my wordpress website and added it to the Google Webmasters. Verification is done succcessfully and the robot.txt is also submitted in webmasters.

I am using “yoast seo” pluigin to setup the SEO part. So, I am using their sitemap as of now. But, when I test the sitemap whose path is domain/sitemap_index.xml the google webmaster is giving this error “Url blocked by robots.txt” and there are total 23 URL’s blocked by google.

I have done following things to rectify the possible issues:

  1. The “discourage search engines” option is not checked.

  2. I have installed a plugin to edit the default robots.txt and my current robots.txt is below:

User-Agent: *

Disallow:

  1. I have also checked .htaccess file and it is looking fine.

I am completely clueless about this issue and how to resolve it. I have deleted my website from webmasters 7-8 hours back and added new link without www. I haven’t submitted the sitemap till now because I hope that I will be able to get some guidance before submission so that this time there will be no error. I still do the test run by giving the sitemap path, but it comes with the same error of link blocking.

Looking forward to some help…

You need to edit that to allow robots.

User-Agent:*
Allow: /
1 Like

Thanks for the reply. I just checked again in google sitemap checker and now it is not showing any error. Although I haven’t done any changes in the robot.txt. Below is the test result -

Type: Sitemap index
Number of children in this Sitemap index 23
Error details: No errors found.

Should I still change the robot.txt? Do you recommend changing it for better indexing or now I should not?

Actually, you don’t.

Although Googlebot and some others understand “Allow”, it’s not an official part of the robots protocol.

User-Agent: *
Disallow:

allows all bots access to everything.

http://www.robotstxt.org/robotstxt.html

1 Like

Cool, so that means I need not to make any changes. :slight_smile:

I’ve not seen it written like that before. I have seen disallow: / which will disallow all, so assumed it to be a variation on that or an error missing the /.
It seems to me strange to use “Disallow” to allow things, maybe it means Disallow: “nothing”.
I understand any kind of Allow All statement can be kind of redundant, since spiders will crawl things by default, unless told not to. So robots.txt mainly useful just for saying what not to crawl. Though it can also be used to point to your sitemap.

I think that’s the interpretation. (And the only reason for including a robots.txt file like that, AFAIK, is to stop your logs filling up with 404 errors when search engine bots look for a robots.txt and can’t find one. )

3 Likes

I got the same error this morning again, this is so weird because google was not showing any error earlier. So, I searched a website which is using same wordpress theme for same purpose and they are doing well in google indexing. So, I am now using their robots.txt for my website which is as follows -

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

Let’s see if this works because I am out of options after this. Even when I am allowing google to crawl free it says robots.txt is blocking URL’s. So, if this works then fine, else I have to take help from some expert.

None of my sites is big enough or complex enough to warrant a sitemap, so I can’t help from experience. Have you seen this? https://support.google.com/webmasters/answer/183669?hl=en&ref_topic=4581190 (There is quite a bit of information about sitemaps in Search Console Help, but that one includes dealing with error messages.)

Thanks for the reply. The thing which is irratating me more than anything is that the low priority pages such as “terms and conditions”, “faq” etc are getting indexed but the main page and other important pages are being thrown away due to the robots.txt error.

Should I stop using sitemap created by yoast SEO? Or, should I delete my robots.txt? Is any of this is going to help?

To point out yoast sitemap, to find my home page in sitemap I have to do the following -

  1. domain.com/sitemap_index.xml (here i find all the random pages of website, not the main pages).
  2. I click on “domain.com/page-sitemap.xml” then I get the main pages which I want to see in google indexing.

Do you think that there is something wrong with the sitemap provided by yoast seo?

This link is all about sitemap. But, google is saying that robots.txt is blocking particular sitemap links. And the link which is getting blocked is "domain.com/page-sitemap.xml"

Now, on the above link I have all the main pages which I want to index. If the main page is getting blocked due to robots.txt then there is not point for me to get other pages indexed. :pensive:

It seems unllkely. Yoast is a very popular plug-in, so I’d expect it to be reliable in that respect. (I’ve never used it, so again, I can’t comment from experience.)

An xml sitemap is only there to help bots find and crawl your pages. If you have a good linking structure in your site, then your pages will be found anyway. You can also submit any important URLs directly through Webmaster tools. So blocking of the sitemap may be inconvenient, but it won’t prevent your pages being crawled and indexed.

Did you try the procedure recommended in the link I posted?

URLs not accessible

Indicates that Google encountered an error when attempting to view a URL in your sitemap.

  • Ensure that the file exists at the specified location.
  • Verify by using the robots.txt tester to confirm which file is blocking it.
  • Use the Fetch as Google tool to see if it’s blocked by robots.txt.
  • If we attempted to crawl the URL from your sitemap, make sure that your sitemap lists the URL correctly.

Ensure that the file exists at the specified location.
The xml sitemap is generated by yoast, so the file location is provided by plugin itself.

Verify by using the robots.txt tester to confirm which file is blocking it.
When I test robots.txt the test button turns into “Allowed” in green color.

Use the Fetch as Google tool to see if it’s blocked by robots.txt.
When I do this, the result comes as - Desktop —> Complete ----> URL submitted to index

If we attempted to crawl the URL from your sitemap, make sure that your sitemap lists the URL correctly.
This is what I am not sure of, because the sitemap list is created by yoast and I don’t know how right or wrong it is.

Also, I just ran a sitemap test again and the result is as follows:

Sitemap: /sitemap_index.xml
Type: Sitemap index
Number of children in this Sitemap index 20
Error details: No errors found.

If you enter the URL(s) from your sitemap directly into your browser address bar, does the page display? (Google is just asking you to confirm that the URL is correctly listed for the resource it should point to.)

Yes, the page display. I use the same url “domain.com/sitemap_index.xml” to test the sitemap, the result comes ok without any errors and then I sumbitted the same link. Also, when I check it in browser, it shows the sitemap generated by yoast plugin.

Sorry - I’m getting confused about where your problem actually lies. In your first post, you said

I took that to mean that 23 URLs listed on that sitemap are blocked by the robots.txt file. But now you seem to be saying that it’s the sitemap itself which is blocked by robots.txt. Please could you clarify this for me?

I understand. The reason behind this confusion is that the error was gone when I Tested the sitemap 2-3 times after submitting the post. After that, I submitted that sitemap as the test was showing no error.

But, today in the morning time when google processed the sitemap it started showing the error again. Now there are following points to be noted -

  1. When I test the sitemap is says “no error found”. Even when I test it right now.
  2. But, when I submit and google processes the sitemap, it gives the error of “url’s blocked by robots.txt”
  3. Also, when I do fetch as google, it says “complete” and let’s me submit homepage link for indexing purpose. Which I just did an hour back.
  4. About the robots.txt, when I run the test, it says “Allowed” in green color.

So, all the points above are making me confused that testing is fine, but sitemap processing gives the error. How is that possible? And if possible, then why is it happening?

You may think that I might be checking the old error with an old date. It says -

Processed - Jun 1, 2016
Issues - 1 errors
Web Items Submitted - 268
Indexed - 16

Details of “1 errors” Mentioned Above Are as Follows -

URL restricted by robots.txt
domain.com/page-sitemap.xml

So, basically it is still indexing super low priority links. But, saying NO to the main pages because all of them fall under “domain.com/page-sitemap.xml” which is blocked by robots.txt.

I hope I am able to solve any kind of confusion. Please let me know if I should clarify on any of the points above. Thanks very much for the followups.

No; as I explained earlier, pages listed on that sitemap will still be indexed, provided Google can find and crawl them, even if the sitemap itself is blocked. It is not actually necessary to have a sitemap at all, although it’s usually recommended for a large site. https://support.google.com/webmasters/answer/156184?hl=en

Ok, I got your point. But, then why google is indexing unimportant pages like faq, terms and conditions etc? These pages are indexed because they do not fall under that blocked sitemap link. All those which are falling under blocked sitemap link are not indexed.

Should I wait for some time?

Are those linked from every page on your site? If so, then they’re probably being indexed first because they’re being found first. If you don’t want them indexed, then you should block them with robots.txt or via the robots meta tag.

If the site is new and quite large, then you should expect it to take some time for all your pages to be crawled and indexed. Google doesn’t crawl the whole site in one visit. And adding URLs to a sitemap only helps Google find them; it doesn’t guarantee they will be indexed.

Using a sitemap doesn’t guarantee that all the items in your sitemap will be crawled and indexed, as Google processes rely on complex algorithms to schedule crawling.