PHP - Scraping JSP with cURL- HTTP Status 400

I’m trying to do the scraping to a jsp area reserved with PHP. With “Advanced REST cleint” for Chrome i can log in in and view the contents of the page in this way:

  1. GET request to www.website.com The header gives me the JSESSIONID that I will use for future requests

  2. GET request to www.website.com/login.jsp

  3. POST request to www.website.com/i_security_check post variables: j_username, j_password, submit, utente

  4. GET request to www.website.com/reserved_area.jsp

If I try to do this “algorithm” with curl on PHP, at the third step I get the following error: HTTP Status 400 - Invalid direct reference to form login page

This is the PHP code:

<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,"https://www.website.com");
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)');
$server_output = curl_exec ($ch);
curl_close ($ch);
$step1 = explode("JSESSIONID=", $server_output);
$step2 = explode("; ", $step1[1]);

$JSESSIONID = $step2[0];

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,"https://www.website.com/login.jsp");
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_HTTPHEADER, array("Cookie: JSESSIONID=".$JSESSIONID));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)');
$server_output = curl_exec ($ch);
curl_close ($ch);

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://www.website.com/i_security_check");
curl_setopt($ch, CURLOPT_POST, 1);
$data = array("j_password" => "PASSWORD", "j_username" => "USERNAME", "submit" => "Entra", "utente" =>"USERNAME");
curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_HTTPHEADER, array("Cookie: JSESSIONID=".$JSESSIONID, "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Accept-Language: it-IT,it;q=0.8,en-US;q=0.5,en;q=0.3", "Connection: keep-alive", "Host: www.website.com", "Referer: https://www.website.com/login.jsp", "User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:37.0) Gecko/20100101 Firefox/37.0"));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$server_output = curl_exec ($ch);
curl_close ($ch);

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "www.website.com/reserved_area.jsp");
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_HTTPHEADER, array("Cookie: JSESSIONID=".$JSESSIONID, "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Accept-Language: it-IT,it;q=0.8,en-US;q=0.5,en;q=0.3", "Connection: keep-alive", "Host: www.website.com", "User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:37.0) Gecko/20100101 Firefox/37.0"));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$server_output = curl_exec ($ch);
curl_close ($ch);
?>

Any ideas? Thank you

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.