Additional hacked pages still showing in Google Two Weeks after Removal

lologirl · August 12, 2017, 11:26am

Hi all, I haven’t come across this before, so I’m hoping someone can help.

Two weeks ago, I noticed a client’s PHP website was hacked. I easily found the rogue JS file, and removed it, which resulted in all of the Japanese pages being removed. There was approximately 5 thousand additional pages. This was two weeks ago. All of the pages now return 404s.
However searches in Google still return the results OVER TWO WEEKS LATER.

After one week, I edited the robots.txt file to block these pages. I can verify that the ‘extra pages’ (that now are removed, and do not exist) are blocked, by checking within Google Search Console.

Does anyone have any ideas on what I’m missing? Why are these removed pages still showing in Google, and why is Google not respecting the disallow instructions in robots.txt?

TechnoBear · August 12, 2017, 11:36am

When something similar happened to one of my sites, I ended up having to remove the URLs from Google’s index via the option in Search Console. They will vanish eventually, but I found it was taking an awfully long time. I didn’t have as many URLs to deal with; around 350, IIRC.

The URLs won’t automatically disappear until Google recrawls them enough times to “decide” they no longer exist. (I don’t know how many times that is, but I’m pretty sure it doesn’t do it the first time it encounters a 404, which may simply be a one-off glitch.) That could take a long time with so many URLs.

Unfortunately, robots.txt doesn’t guarantee that Google will not crawl your pages. If there are links from other sites, Google will follow those. https://support.google.com/webmasters/answer/6062608?hl=en

Gandalf · August 12, 2017, 11:38am

I had a similar problem. The site wasn’t actually hacked thanks to the firewall, but somehow the would-be hackers convinced Google, Bing, Yahoo etc that the pages did exist and they have been indexed, and despite them serving 404s they are still indexed months after the event.

lologirl · August 12, 2017, 11:55am

Hey, thanks for the reply.

Re the robots.txt, I’m ‘lucky’ in that the additional URLs are only reached via internal links - no external links at all. I do understand though that robots.txt is a ‘request’ and not a guarantee to a search engine.

Thanks for sharing your experience. It’s interesting you had to block the pages - did you have to do each of the 350 manually? I don’t fancy doing this for this website (the website itself is not particularly popular or big).

lologirl · August 12, 2017, 11:56am

Is that because of external links to your website? I wonder would disavowing help in your situation?

TechnoBear · August 12, 2017, 12:04pm

Yes. I did them in batches over a couple of days.

This was several years ago, in the days of GWT, rather than Search Console, and the actual mechanism for removing a URL has changed slightly, but as far as I know, you’d still need to do them one at a time.

I don’t think that’s the issue here. The issue is that the spurious URL on the site needs to be removed; disavow only tells Google to ignore the incoming link as a backlink (and they advise it should only be used if you have received a notice regarding low-quality links).

John_Betong · August 12, 2017, 12:29pm

This may be relevant…

Quite some time ago I was using the canonical reference to redirect numerous pages and read somewhere that Google gave more weight to pages with a 301 redirect.

lologirl · August 12, 2017, 12:56pm

I don’t think that’s the issue here. The issue is that the spurious URL on the site needs to be removed; disavow only tells Google to ignore the incoming link as a backlink (and they advise it should only be used if you have received a notice regarding low-quality links).

Yes, I meant, that it might’ve helped the with an issue like gandalf458’s, I agree that it would be no use for my issue.

Yes. I did them in batches over a couple of days.

Ugh, horrible!!

Gandalf · August 12, 2017, 1:14pm

In my case, it is a dormant domain with no content with noting other than a home page saying

This website is dead.
It is a ex-website.
It has passed on.
This website is no more.
It has ceased to be.
It has expired and gone to meet its maker.

so it’s not something I have spent much time or energy on, but I am struggling to understand how the search engines can be persuaded that content exists when it blatantly doesn’t.

SamA74 · August 12, 2017, 1:23pm

It can seem to take forever for 404 pages to get dropped from the index. I’ve never had a problem with hacking, but I have seen old, obsolete, removed pages appearing as 404 errors in GSC years after they were gone.
The only thing I have found that fixes these is to 301 them, but that’s only really a valid option if there is existing equivalent content, which in the case of “hack pages” there is not.

Gandalf · August 12, 2017, 1:25pm

You can also use 410 for pages that have ceased to be.

SamA74 · August 12, 2017, 1:29pm

I tried that too, thinkking that would be a message to the spiders to say “Hey, forget about this, it’s gone, drop it form the index.” but they still appear in “Crawl Errors” in GSC.

Gandalf · August 12, 2017, 1:32pm

That’s odd. I understood that to be the main purpose of 410. Clearly these bots aren’t as clever as they are made out to be!

lologirl · August 13, 2017, 10:03am

One thing that maybe I should have mentioned in my OP, is that in GSC, crawl errors (with these 'hacked, now-deleted, pages) appear every day. For example, in this pic, you can see the error was detected on the 10th of this month, but the pages (and links to the hacked pages) were removed back in July. I don’t understand how errors are being detected on a day where there is no error. It seems impossible.

lologirl · August 13, 2017, 10:04am

Interesting, and do they still appear in actual Google search? When you do a search like site:example.com in Google, do you see your hacked pages, even after the 410?

system · November 12, 2017, 5:04pm

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Google search showing URL that don't exist anymore Marketing seo	3	5255	April 20, 2018
Hacked Website Indexing Issue Get Started	0	2223	November 20, 2019
Problem with robots.txt PHP	3	345	October 27, 2011
Question about Google Webmaster Tool Marketing	4	493	August 19, 2010
How to remove 404 error page from google serch engine result pages? Marketing	14	37787	January 26, 2012

Additional hacked pages still showing in Google Two Weeks after Removal

Related topics