Watch out your bandwith usage!

Hi,

On 2 websites (not related) I faced the same bandwith usage in June. BaiduSpider and Googlebot “burned” huge amount of bandwith. In one particular website I noticed that normal web traffic used 20Gb for June whereas GoogleBot used 96Gb and BaiduSpider usde 89Gb which is 185Gb only for 2 spiders or bots ! I just reduced the Google’s crawling rates from the Webmastertool panel and disallowed Baidu from robots.txt let’s see what will happened.

Did you check the bot’s IP address to verify that it was a genuine Google bot? Some bots may spoof the googlebot useragent. I’ve had to block a number of undesirable bots that were sucking up a lot of data transfer using htaccess. That Brandwatch bot hammered my site with up to 8 page requests per second. I emailed the folks over there and requested they stop the bot from spidering my site. They said it would stop but it didn’t. Now all it gets is a denied error and zero bytes transfered. If you don’t want Chinese traffic, blocking Baidu is a good idea. When I blocked Baidu, the amount of spam I got on my forum dropped. Blocking Baidu, Yandex, and denying referrers from .cn, .ru, and .ua tlds dropped my forum spam to next to nothing. I have only had a few spam posts all year, less than I used to get in a single day. It isn’t just data transfer these bots consume, they also use server resources and there is no point in getting into hot water with your host for consuming too much server resources to serve data to bots that do nothing for you.

Thanks for your input. It appears genuine Googlebot traffic, I lowered the crawl rates. As for Baidu I’ll guess I’ll have to find something as I am not sure about htaccess on that windows server.

Quite some time ago I experienced similar Bot bandwidth problems. I Googled and applied the following recommendation, which is ignored by Google.

robots.txt


Crawl-delay: 10

It may have been coincidental but I am pleased to say the Bot bandwidth dropped.

BaiduSpider activity:

Hey Cheesedude, sorry to hear that our bot hasn’t been obeying. Do you mind sending another email to joel@ brandwatch .com with the relevant information and I’ll try to get that sorted fo you.

Thanks,

Joel Windels
LCM at Brandwatch