Reading data from html site - problem

hi
I need to read information from one web site via PHP. I know how to do it. The problem is that web site include advert website. That advert website has same URL as the website that i want to get data. When you visit the site with browser the advert site will appear only once per day. But when i try to read html code with file_get_contents(), it show me always the code of advert website. On that advert site is a link to the website from where i want to read.
Here is code that i use:

$url = “http://eu.battle.net/sc2/en/profile/355351/1/GoodWill/”;//website from where i want to get data
$html = file_get_contents($url);
//html of web site(that advert one)- link “Continue to the StarCraft II Community Site” leads to site that i need
echo $html;
preg_match(‘~id=“continue”><a href=“(?<jojo>.*?)”>Continue to~’,$html,$match);
//here is link to site that i want - It’s almost the same as above one
echo($match[jojo]);

Can anyone help me?
btw i was trying to import the website to excel and then somehow get the data i want. The import was good, but i don’t know much how to code in PHP+Excel.

What you’re trying to achieve, will not work.

The reason you see the advert, is because the website cannot detect a cookie stating that you have seen it already.


Set-Cookie: perm=1; Domain=battle.net; Path=/ 
Set-Cookie: int-SC2=1; Domain=.battle.net; Path=/ 

I suspect you’re trying to obtain the link to bypass this advert, unfortunately, the link you’re trying to obtain is exactly the one you have already.

:wink:


<?php
$handle = curl_init('http://eu.battle.net/sc2/en/profile/355351/1/GoodWill/');
curl_setopt_array(
  $handle,
  array(
    CURLOPT_COOKIE          => 'int-SC2=1; perm=1; Domain=battle.net;',
    CURLOPT_RETURNTRANSFER  => true,
    CURLOPT_MAXREDIRS       => 3,
    CURLOPT_FOLLOWLOCATION  => true
  )
);
$html = curl_exec($handle);
curl_close($handle);

echo htmlentities($html);
?>

You’re welcome. :wink:

It appears you maybe able to omit it.


<?php
$handle = curl_init('http://eu.battle.net/sc2/en/profile/355351/1/GoodWill/');
curl_setopt_array(
  $handle,
  array(
    CURLOPT_COOKIE          => 'int-SC2=1; perm=1; Domain=battle.net;',
    CURLOPT_RETURNTRANSFER  => true
  )
);
$html = curl_exec($handle);
curl_close($handle);

echo htmlentities($html);
?>

Thanks a lot for that code
It works but it throws one warning
curl_setopt_array() [function.curl-setopt-array]: CURLOPT_FOLLOWLOCATION cannot be activated when in safe_mode or an open_basedir
I have never used that function