cURL on site that redirects to disclaimer

Hello,

I am trying to scrape some data from Environment Canada about water levels. For example: http://www.wateroffice.ec.gc.ca/graph/graph_e.html?stn=08KA007&prm1=6&mode=text

When you first visit this site it redirects you to a disclaimer page with an “I Agree” button that must be clicked in order to proceed to the data I want. What I need to do is write a PHP script that follows the redirect, submits “I Agree” and then outputs the data from the actual page I need to view. I have tried to do this a few different ways but to be frank it is beyond my basic understanding of PHP. I know it is likely rather simple using cURL and am hoping someone can help me. Here is what I have so far. This gives me the disclaimer page but gets stuck there:


$url = 'http://www.wateroffice.ec.gc.ca/graph/graph_e.html?stn=08KA007&prm1=6&mode=text';
$postdata = array('disclaimer_action' => urlencode('I Agree'));

$ch = curl_init();
if($ch){
   curl_setopt($ch, CURLOPT_URL, $url);
   curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 15);
   curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
   curl_setopt($ch, CURLOPT_POST, 1);
   curl_setopt($ch, CURLOPT_POSTFIELDS, $postdata);
   curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookies_wo.txt'); // set cookie file to given file
   curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookies_wo.txt'); // set same file as cookie jar

   $content = curl_exec($ch);
   $headers = curl_getinfo($ch);

   	curl_close($ch);

  	print_r($headers);

    echo $content;

}

Any help is greatly appreciated.

Here’s what I guess the disclaimer form on Environment Canada is doing:

It probably submits the field “disclaimer_action” with the value “I Agree” to the page “/include/disclaimer.php”. Then, the PHP script “disclaimer.php” probably adds a cookie or a session var to flag the user as “disclaimer accepted”. Then he redirects you based on your “referrer” sent in the HTTP header.

So what I would try:
Script a curl script that sends a POST to the page “/include/disclaimer.php” with the field “disclaimer_action” and the value “I Agree”. Plus, set the referrer to the page you want to go next.
Article on how to set the referrer with curl: http://www.electrictoolbox.com/php-curl-http-referer/
Article on how to post data with curl: http://superuser.com/questions/149329/what-is-the-curl-command-line-syntax-to-do-a-post-request

If it doesn’t work, it probably has to do with the “disclaimer accepted” flag. Use an HTTP Header “sniffer” (like Live HTTP Header plugin for FF or fiddler, etc.) and see what it does then (sets a cookie or session cookie?).

Great. I appreciate the help. I will see what I can make happen using that information. Cheers,
Eric

I can’t quite get it to work. Anyone else have some thoughts or able to send me towards another tutorial that may help?

It could be they are using SESSIONs. Have you tried contacting them to ask about their API?

Perhaps I should ask them, if I could get raw access to .txt or .csv files then life would be much simpler…the Canadian government moves at a snails pace when it comes to things like this. I guess it doesn’t hurt to call though. Who knows maybe it is actually rather simple and they can give me a direct link to the data like most provinces provide.

Here you go:


$cookieOptions = array(
    CURLOPT_COOKIEJAR => '/tmp/curl-cookies',
    CURLOPT_COOKIEFILE => '/tmp/curl-cookies',
);

curl_setopt_array(($ch = curl_init()), $cookieOptions + array(
    CURLOPT_URL => 'http://www.wateroffice.ec.gc.ca/include/disclaimer.php',
    CURLOPT_POST => true,
    CURLOPT_POSTFIELDS => http_build_query(array('disclaimer_action' => 'I Agree')),
));

curl_exec($ch);
curl_close($ch);

curl_setopt_array(($ch = curl_init()), $cookieOptions + array(
    CURLOPT_URL => 'http://www.wateroffice.ec.gc.ca/graph/graph_e.html?stn=08KA007&prm1=6&mode=text',
    CURLOPT_RETURNTRANSFER => true,
));

$res = curl_exec($ch);
curl_close($ch);

echo $res;

:slight_smile:

1 Like

THANK-YOU!!

I would have never gotten that on my own. I really appreciate it and not that it makes any difference but the site is a community run wiki for a group of whitewater paddlers in Western Canada. Not anything I am making money off of. I really appreciate it. Cheers,

E