SitePoint Sponsor

User Tag List

Results 1 to 4 of 4
  1. #1
    SitePoint Member
    Join Date
    Aug 2010
    Posts
    2
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Reading data from html site - problem

    hi
    I need to read information from one web site via PHP. I know how to do it. The problem is that web site include advert website. That advert website has same URL as the website that i want to get data. When you visit the site with browser the advert site will appear only once per day. But when i try to read html code with file_get_contents(), it show me always the code of advert website. On that advert site is a link to the website from where i want to read.
    Here is code that i use:

    $url = "http://eu.battle.net/sc2/en/profile/355351/1/GoodWill/";//website from where i want to get data
    $html = file_get_contents($url);
    //html of web site(that advert one)- link "Continue to the StarCraft II Community Site" leads to site that i need
    echo $html;
    preg_match('~id="continue"><a href="(?<jojo>.*?)">Continue to~',$html,$match);
    //here is link to site that i want - It's almost the same as above one
    echo($match[jojo]);

    Can anyone help me?
    btw i was trying to import the website to excel and then somehow get the data i want. The import was good, but i don't know much how to code in PHP+Excel.

  2. #2
    Twitter: @AnthonySterling silver trophy AnthonySterling's Avatar
    Join Date
    Apr 2008
    Location
    North-East, UK.
    Posts
    6,111
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    What you're trying to achieve, will not work.

    The reason you see the advert, is because the website cannot detect a cookie stating that you have seen it already.
    Code:
    Set-Cookie: perm=1; Domain=battle.net; Path=/ 
    Set-Cookie: int-SC2=1; Domain=.battle.net; Path=/
    I suspect you're trying to obtain the link to bypass this advert, unfortunately, the link you're trying to obtain is exactly the one you have already.



    PHP Code:
    <?php
    $handle 
    curl_init('http://eu.battle.net/sc2/en/profile/355351/1/GoodWill/');
    curl_setopt_array(
      
    $handle,
      array(
        
    CURLOPT_COOKIE          => 'int-SC2=1; perm=1; Domain=battle.net;',
        
    CURLOPT_RETURNTRANSFER  => true,
        
    CURLOPT_MAXREDIRS       => 3,
        
    CURLOPT_FOLLOWLOCATION  => true
      
    )
    );
    $html curl_exec($handle);
    curl_close($handle);

    echo 
    htmlentities($html);
    ?>
    @AnthonySterling: I'm a PHP developer, a consultant for oopnorth.com and the organiser of @phpne, a PHP User Group covering the North-East of England.

  3. #3
    SitePoint Member
    Join Date
    Aug 2010
    Posts
    2
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks a lot for that code
    It works but it throws one warning
    curl_setopt_array() [function.curl-setopt-array]: CURLOPT_FOLLOWLOCATION cannot be activated when in safe_mode or an open_basedir
    I have never used that function

  4. #4
    Twitter: @AnthonySterling silver trophy AnthonySterling's Avatar
    Join Date
    Apr 2008
    Location
    North-East, UK.
    Posts
    6,111
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    You're welcome.

    It appears you maybe able to omit it.

    PHP Code:
    <?php
    $handle 
    curl_init('http://eu.battle.net/sc2/en/profile/355351/1/GoodWill/');
    curl_setopt_array(
      
    $handle,
      array(
        
    CURLOPT_COOKIE          => 'int-SC2=1; perm=1; Domain=battle.net;',
        
    CURLOPT_RETURNTRANSFER  => true
      
    )
    );
    $html curl_exec($handle);
    curl_close($handle);

    echo 
    htmlentities($html);
    ?>
    @AnthonySterling: I'm a PHP developer, a consultant for oopnorth.com and the organiser of @phpne, a PHP User Group covering the North-East of England.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •