SitePoint Sponsor

User Tag List

Results 1 to 5 of 5
  1. #1
    SitePoint Member
    Join Date
    Jun 2010
    Posts
    16
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    CURL and Proxies

    Hey there,

    I am currently using a PHP code with the CURL library to extract results from Google. As some of you may know, Google doesnt like to be scrapped and it's why I am using several private HTTP proxies to do it.

    Here is the problem. After a while, the proxies get blocked by Google.

    Here is what I did to found out the problem.

    When I notice that a proxy get blocked by Google in my script, I immediately go to Google manually logged in with the proxy, and strangely I am not blocked at all.

    Here is my simple CURL code:
    PHP Code:
    $ch curl_init();
                            
    curl_setopt($chCURLOPT_URL'GOOGLE QUERY HERE');
                            
    curl_setopt($chCURLOPT_POST0);
                            
    curl_setopt($chCURLOPT_USERAGENT$user_agent); //$user_agent is randomly selected from a list wich contain the most popular user agent                        
                            
    curl_setopt($chCURLOPT_RETURNTRANSFER1);
                            
    curl_setopt($chCURLOPT_FOLLOWLOCATION1);
                            
    curl_setopt($chCURLOPT_COOKIEJAR"my_cookies.txt");
                            
    curl_setopt($chCURLOPT_COOKIEFILE"my_cookies.txt");
                            
    curl_setopt($chCURLOPT_COOKIESESSIONtrue);  
                            
    curl_setopt($chCURLOPT_SSL_VERIFYPEER0);
                            
    curl_setopt($chCURLOPT_HTTPPROXYTUNNEL1);
                            
    curl_setopt($chCURLOPT_PROXY$proxies); //$proxies is randomly selected from my proxies list                        
    $source curl_exec($ch); 
    IS there anything wrong in my code that could produce footprint/create undesirable cookies, etc..??

    The thing that I really dont understand is why does Google block me when I am accessing his website using a script and not when I acces it manually even if I am sending the SAME query?

  2. #2
    SitePoint Wizard silver trophybronze trophy Cups's Avatar
    Join Date
    Oct 2006
    Location
    France, deep rural.
    Posts
    6,869
    Mentioned
    17 Post(s)
    Tagged
    1 Thread(s)
    I cannot understand why you do not simply use their official API, unless for some reason you cannot comply with their terms and conditions - in which case what you are doing is contra them and therefore probably illegal.

  3. #3
    Twitter: @AnthonySterling silver trophy AnthonySterling's Avatar
    Join Date
    Apr 2008
    Location
    North-East, UK.
    Posts
    6,111
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    Sorry Dieuz, this type of approach is against Google's TOS, and therefore forbidden to be discussed on SitePoint.

    Have you considered seeing if Google allows you API access to the data you require?
    @AnthonySterling: I'm a PHP developer, a consultant for oopnorth.com and the organiser of @phpne, a PHP User Group covering the North-East of England.

  4. #4
    SitePoint Member
    Join Date
    Jun 2010
    Posts
    16
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by AnthonySterling View Post
    Sorry Dieuz, this type of approach is against Google's TOS, and therefore forbidden to be discussed on SitePoint.

    Have you considered seeing if Google allows you API access to the data you require?
    I have not checked Google API yet.

    I will take a look, thanks!

  5. #5
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    9,083
    Mentioned
    153 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by AnthonySterling View Post
    Sorry Dieuz, this type of approach is against Google's TOS, and therefore forbidden to be discussed on SitePoint.
    That being said: thread closed.

    Dieuz, if have any questions regarding the API you were pointed to, please start a new thread.
    Rémon - Hosting Advisor

    SitePoint forums will switch to Discourse soon! Make sure you're ready for it!

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •