Crawl Errors Rising

Hi All,

I hope you can please help me.

I have been getting quite a lot of crawl errors during the last few weeks and I am becoming very concerned by this as I think it could be affecting my rankings.

A few weeks a go, I removed about 40 web pages from the site as they were all 99% the same as each other - duplicate content so I deleted pages on the website and requested google block the url for each.

As a result of this action my not found url errors has gone from 24 on the 12.1.14 to 96 as of today. I do not understand why this number has kept on rising after all it was 40 pages. In addition I would have thought that google would have removed the links by now so the number of errors should fall. Can you please tell me if I am doing something wrong or if there is a way to resolve this issue.

The last thing I want is the number to keep increasing and thus my rankings would also suffer.

Any help on this matter would be greatly appreciated

Thanks in advance

Hi si2010. Welcome to the forums. :slight_smile:

Whenever I remove a page from a site, I always set up a .htaccess redirect to a different, existing page. This keeps the search engines happy and prevents all those 404s.

Did you try the links using “Fetch as Google”? If so what were the results?

Have you tried clicking the offending links using Google Webmaster Tools? If so what were the results?

Have you tried blocking the “old duplicate content pages” in your robots.txt file?

Did you remember to remove the “old duplicate content pages” from your Sitemap.xlm file?

Some people use other Search Engines so both suggestions about the Sitemap and robots files will be of benefit Bing, Yahoo, etc.

I would go with Ralph’s suggestion and make sure you set up redirects for those pages. It’s not only search engines which may follow old links to non-existent pages; real human visitors may do so, too. Much better for them to end up at a relevant page than looking at a “404 Not Found” message.

Haven’t tried this one yet but I think you can also use a redirect plugin?

Thank you Guys much appreciated with your responses

I will do the redirect

Hey, John in answer to your questions please see below:

When I use the fetch and type in some of the old links they return 404 i have only checked a few I suspect they are all the same.

The same happens if i click on the old links in google websmaster tool, they all return a 404

Robot txt is currently the following:

#Begin Attracta SEO Tools Sitemap. Do not remove
sitemap: http://cdn.attracta.com/sitemap/3088898.xml.gz
#End Attracta SEO Tools Sitemap. Do not remove

I removed the old sitemap xml and updated it with a new one showing the latest links. This is currently waiting to be indexed.

On the subject of indexing, in google webmaster tools it always show 1 indexed, which is the site map and 66 urls submitted

But when I look at the Google Index tab it shows 193 why is there a difference between the two?

Very odd

Maybe try adding the following to your robots.txt



  Disallow: /duplicate-content-001.html
  Disallow: /duplicate-content-002.html
  Disallow: /duplicate-content-003.html


How come your sitemap is having only one url?

The problem with relying on robots.txt to block a page which no longer exists is that it will only work for search engines (not humans) and only when they enter from your own site. If another site has a link to a page which no longer exists, then bots - and humans - will follow that link and face a 404 error.

Very true and the original poster was trying to keep Google Webmaster Tools happy. I found that Google Webmaster Tools does take notice of robots.txt as far as reducing their crawl errors.

As far as humans are concerned and links from other sites then Ralph’s technique should work well for a site with only a few pages.

My approach is to try and find a nearest match to the URL and presenting valid page options. Where there are no matches then pass the parsed URL string contents through to my Google’s Search Page.

[ot]

Ah - I suspect we’re approaching GWT differently. When I get an error in WT, I take it as an indication of something wrong in my site, and try to fix it to improve the site, rather than to keep GWT happy. :wink: And sometimes an error in GWT is just that - an error. i.e. theirs, not mine. :)[/ot]

you can use google webmaster, and go to remove url, and submit request removal for your invalid URL. to fix crawling

Hi haniv and welcome to the forum,

Please note the Original Poster (OP) stated the following in their first post:

A few weeks a go, I removed about 40 web pages from the site as they were all 99% the same as each other - duplicate content so I deleted pages on the website and requested google block the url for each.

I assume the OP used Google Webmaster Tools to “requested to google”.