XML sitemap - Google indexing less than submitted

About a month back, I updated a sitemap which is being indexed total by G going on a couple years now.

Though in June, 230 pages were submitted. It thereafter indexed 229 for several weeks until most recently, when viewing the sitemap info under Webmaster Tools, G started to leave something to the effect “Oops, something is wrong here… we are looking into this”

Now I go in and 228 pages are indexed with no error message, like all is well & fine.

How to find the pages being left off and what to do?

Just because a page is referenced in the sitemap, that doesn’t mean it will always appear in Google’s index. The page might be blocked, for example by robots.txt or a noindex attribute. Or it could be that the crawler was unable to access it for some reason.

Have you checked “Crawl Errors” and “Blocked URLs” in Webmaster Tools?

Mike

Thanks for the reply Mike.

No blocked urls but crawl errors – looks like a couple external links chopped my urls. Didn’t know this could be cause for the grand master saying that the two pages aren’t being indexed…

Wondering if G is changing the way they are reporting 404’s.

About 2 1/2 yrs ago the site was redone. I did the custom 404 page for a list of pages I felt shouldn’t be 301’s, only choosing those that most closely relate with the new pages - in an effort to try and not exploit the 301/overload with too many redirects. After this, well over a year later, G decides to start resurrecting those pages that naturally fell off their index as 404’s, as if this is not what they want to hear. Maybe I should go back and do all 301’s?

Also, I thought there might some sort of penalty for overuse of the 301.

Datadriven,

An incorrect external link won’t prevent any pages from being indexed. Typically, these are links where the domain name is correct, but either the filename is wrong, or it’s got some spurious characters tagged on the end. Google will report these in Webmaster Tools, but they won’t do you any harm.

Wondering if G is changing the way they are reporting 404’s

No, not that I’ve heard of. I’d be very surprised if they had.

Also, I thought there might some sort of penalty for overuse of the 301

Again, no. I’ve never heard of this, and it would be surprising if they did do any harm.

If you’re sure you haven’t intentionally blocked any pages (on many sites, things like contact pages or privacy policies are intentionally blocked), your only option is to try to figure out which pages aren’t indexed, which will be laborious and time-consuming. Alternatively, you can just not worry about it. You say you’ve got 228 out of 230 pages indexed. Chances are that’s enough to bring you the traffic you want.

Mike