I had a sudden increase in Googlebot accessing my website. The last week of May this bot was burning about 25Gb per day of bandwith whereas “normal” users were using around 2Gb
Eventually this caused an overuse of bandwith costing me an hefty US$476 ! Pretty high for a no-commercial website !
Anybody else around beeing hit by something similar ?
Are you sure it was Google?
There are a lot of bad bots out there that “pretend” to be google.
It looks like indeed they are IP addresses pointing to Google, 66.249.70.78 “burned” 75Gb in 4 days !
My first thought would’ve been a malicious bot posing as the google spider, but I did a reverse lookup on the IP, and it appears to be an actual google IP address.
Strange enough in Google Webmaster tools nothing out of the ordinary happened.
I changed the crawl rate in the Webmaster tool to 125 seconds between requests. Nada ! Zilch ! Rien ! Nichts !
How often do you really need the Google bot to crawl? In my case, it only shows up once every two weeks (and I don’t update my personal site much if any, so that’s fine). You may want to consider once every x hours instead milisecs
Also, you may consider to create a txt file for the bots, to tell them where they should go so they don’t try to crawl areas of no interest but which do reflect in your bandwidth
Every day would be just fine for me !
Webmaster tools doesn’t allow to set it to more than 125 seconds. Does Googlebot consider crawl rate in a robot.txt ?
Supposedly it takes the sitemap.xml file as a “suggestion”.
No. Just the paths it must/musn’t go to index the pages
I had a similar problem with crawlers and managed to solve the problem.
Try this, nicked from my robots.txt file.
Google suggestion because of too many “failed crawls” - 2012-12-01
User-agent: Mediapartners-Google
Disallow:
Allow: /
Crawl-delay: 10
Please ignore spelling and formatting, I am using a t ablet.