I have been getting quite a lot of crawl errors during the last few weeks and I am becoming very concerned by this as I think it could be affecting my rankings.
A few weeks a go, I removed about 40 web pages from the site as they were all 99% the same as each other - duplicate content so I deleted pages on the website and requested google block the url for each.
As a result of this action my not found url errors has gone from 24 on the 12.1.14 to 96 as of today. I do not understand why this number has kept on rising after all it was 40 pages. In addition I would have thought that google would have removed the links by now so the number of errors should fall. Can you please tell me if I am doing something wrong or if there is a way to resolve this issue.
The last thing I want is the number to keep increasing and thus my rankings would also suffer.
Any help on this matter would be greatly appreciated
Whenever I remove a page from a site, I always set up a .htaccess redirect to a different, existing page. This keeps the search engines happy and prevents all those 404s.
I would go with Ralph’s suggestion and make sure you set up redirects for those pages. It’s not only search engines which may follow old links to non-existent pages; real human visitors may do so, too. Much better for them to end up at a relevant page than looking at a “404 Not Found” message.
Thank you Guys much appreciated with your responses
I will do the redirect
Hey, John in answer to your questions please see below:
When I use the fetch and type in some of the old links they return 404 i have only checked a few I suspect they are all the same.
The same happens if i click on the old links in google websmaster tool, they all return a 404
Robot txt is currently the following:
#Begin Attracta SEO Tools Sitemap. Do not remove
sitemap: http://cdn.attracta.com/sitemap/3088898.xml.gz
#End Attracta SEO Tools Sitemap. Do not remove
I removed the old sitemap xml and updated it with a new one showing the latest links. This is currently waiting to be indexed.
The problem with relying on robots.txt to block a page which no longer exists is that it will only work for search engines (not humans) and only when they enter from your own site. If another site has a link to a page which no longer exists, then bots - and humans - will follow that link and face a 404 error.
Very true and the original poster was trying to keep Google Webmaster Tools happy. I found that Google Webmaster Tools does take notice of robots.txt as far as reducing their crawl errors.
As far as humans are concerned and links from other sites then Ralph’s technique should work well for a site with only a few pages.
My approach is to try and find a nearest match to the URL and presenting valid page options. Where there are no matches then pass the parsed URL string contents through to my Google’s Search Page.
Ah - I suspect we’re approaching GWT differently. When I get an error in WT, I take it as an indication of something wrong in my site, and try to fix it to improve the site, rather than to keep GWT happy. And sometimes an error in GWT is just that - an error. i.e. theirs, not mine. :)[/ot]
Please note the Original Poster (OP) stated the following in their first post:
A few weeks a go, I removed about 40 web pages from the site as they were all 99% the same as each other - duplicate content so I deleted pages on the website and requested google block the url for each.
I assume the OP used Google Webmaster Tools to “requested to google”.