Hey there,
I am currently using a PHP code with the CURL library to extract results from Google. As some of you may know, Google doesnt like to be scrapped and it’s why I am using several private HTTP proxies to do it.
Here is the problem. After a while, the proxies get blocked by Google.
Here is what I did to found out the problem.
When I notice that a proxy get blocked by Google in my script, I immediately go to Google manually logged in with the proxy, and strangely I am not blocked at all.
Here is my simple CURL code:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'GOOGLE QUERY HERE');
curl_setopt($ch, CURLOPT_POST, 0);
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent); //$user_agent is randomly selected from a list wich contain the most popular user agent
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_COOKIEJAR, "my_cookies.txt");
curl_setopt($ch, CURLOPT_COOKIEFILE, "my_cookies.txt");
curl_setopt($ch, CURLOPT_COOKIESESSION, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, 1);
curl_setopt($ch, CURLOPT_PROXY, $proxies); //$proxies is randomly selected from my proxies list
$source = curl_exec($ch);
IS there anything wrong in my code that could produce footprint/create undesirable cookies, etc…??
The thing that I really dont understand is why does Google block me when I am accessing his website using a script and not when I acces it manually even if I am sending the SAME query?