cURL Failing To Download Page

superappsbuilder · September 10, 2019, 6:07pm

I know what htmlspecialchars, striptags, htmlentities are. I just did not understand the codes you were showing in the text boxes.

I’m guessing now that, on the 1st text box, you were showing the fetched page’s whole source code (html) after ridding the htmlspecialchars and on the 2nd you were you were showing the fetched page’s whole source code (html) after ridding the html tags (striptags) and on the 3rd text box you were showing the fetched page’s whole source code (html) after ridding the htmlentities.
Is that what you were doing ? If so, why were you doing all that ? Tell me that as I may learn something here.

Btw, what book are you reading ? Lord of The Rings ? Hobbit ? Dragonlance Chronicles ? Lol!

superappsbuilder · September 10, 2019, 6:21pm

Ok. Reply to this post tomorrow then …

I did as you asked. Look:

1. `echo gettype($body);`
2. `echo '<pre>'; var_dump($body); echo '</pre>';`
3. `echo '<pre>'; print_r($body); echo '</pre>';`

<?php 

//ERROR REPORTING CODES. 
declare(strict_types=1); 
ini_set('display_errors', '1'); 
ini_set('display_startup_errors', '1'); 
error_reporting(E_ALL); 
mysqli_report(MYSQLI_REPORT_ERROR | MYSQLI_REPORT_STRICT); 

/*
Download a Webpage via the HTTP GET Protocol using libcurl
*/
function _http ( $target, $referer ) {
	//Initialize Handle
	$handle = curl_init();
	//Define Settings
	curl_setopt ( $handle, CURLOPT_HTTPGET, true );
	curl_setopt ( $handle, CURLOPT_HEADER, true );
	curl_setopt ( $handle, CURLOPT_COOKIEJAR, "cookie_jar.txt" );
	curl_setopt ( $handle, CURLOPT_COOKIEFILE, "cookies.txt" );
	curl_setopt ( $handle, CURLOPT_USERAGENT, "web-crawler-tutorial-test" );
	curl_setopt ( $handle, CURLOPT_URL, $target );
	curl_setopt ( $handle, CURLOPT_REFERER, $referer );
	curl_setopt ( $handle, CURLOPT_FOLLOWLOCATION, true );
	curl_setopt ( $handle, CURLOPT_MAXREDIRS, 4 );
	curl_setopt ( $handle, CURLOPT_RETURNTRANSFER, true );
	//Execute Request
	$output = curl_exec ( $handle );
	//Close cURL handle
	curl_close ( $handle );
	//Separate Header and Body
	$separator = "\r\n\r\n";
	$header = substr( $output, 0, strpos( $output, $separator ) );
	$body_start = strlen( $header ) + strlen( $separator );
	$body = substr( $output, $body_start, strlen( $output ) - $body_start );
	//Parse Headers
	$header_array = Array();
	foreach ( explode ( "\r\n", $header ) as $i => $line ) {
		if($i === 0) {
			$header_array['http_code'] = $line;
			$status_info = explode( " ", $line );
			$header_array['status_info'] = $status_info;
		} else {
			list ( $key, $value ) = explode ( ': ', $line );
			$header_array[$key] = $value;
		}
	}
	//Form Return Structure
	$ret = Array("headers" => $header_array, "body" => $body );
	return $ret;
}
$page = _http( "https://potentpages.com", "" );
$headers = $page['headers'];
$http_status_code = $headers['http_code'];
$body = $page['body'];


    echo gettype($body);
    echo '<pre>'; var_dump($body); echo '</pre>';
    echo '<pre>'; print_r($body); echo '</pre>';


foreach($body as $b => $b_value)
{
  if( is_string($b_values) ): 
    echo "Key=" . $b . ", Value=" . $b_value;
  else:
    // added pre to add line feeds
    echo '<pre>'; print_r($b_value); echo '</pre>';
  endif;
  echo "<br>";
}

echo "Headers:"; ?> <br> <?php 
print_r($headers) ?> <br> <?php 
echo "BODY:"; ?> <br> <?php 
print_r($body)

?>

My browser just shows this:
string

string(133615) "

So, what do I do now ?

Here’s the full code:

<?php 

//ERROR REPORTING CODES. 
declare(strict_types=1); 
ini_set('display_errors', '1'); 
ini_set('display_startup_errors', '1'); 
error_reporting(E_ALL); 
mysqli_report(MYSQLI_REPORT_ERROR | MYSQLI_REPORT_STRICT); 

/*
Download a Webpage via the HTTP GET Protocol using libcurl
*/
function _http ( $target, $referer ) {
	//Initialize Handle
	$handle = curl_init();
	//Define Settings
	curl_setopt ( $handle, CURLOPT_HTTPGET, true );
	curl_setopt ( $handle, CURLOPT_HEADER, true );
	curl_setopt ( $handle, CURLOPT_COOKIEJAR, "cookie_jar.txt" );
	curl_setopt ( $handle, CURLOPT_COOKIEFILE, "cookies.txt" );
	curl_setopt ( $handle, CURLOPT_USERAGENT, "web-crawler-tutorial-test" );
	curl_setopt ( $handle, CURLOPT_URL, $target );
	curl_setopt ( $handle, CURLOPT_REFERER, $referer );
	curl_setopt ( $handle, CURLOPT_FOLLOWLOCATION, true );
	curl_setopt ( $handle, CURLOPT_MAXREDIRS, 4 );
	curl_setopt ( $handle, CURLOPT_RETURNTRANSFER, true );
	//Execute Request
	$output = curl_exec ( $handle );
	//Close cURL handle
	curl_close ( $handle );
	//Separate Header and Body
	$separator = "\r\n\r\n";
	$header = substr( $output, 0, strpos( $output, $separator ) );
	$body_start = strlen( $header ) + strlen( $separator );
	$body = substr( $output, $body_start, strlen( $output ) - $body_start );
	//Parse Headers
	$header_array = Array();
	foreach ( explode ( "\r\n", $header ) as $i => $line ) {
		if($i === 0) {
			$header_array['http_code'] = $line;
			$status_info = explode( " ", $line );
			$header_array['status_info'] = $status_info;
		} else {
			list ( $key, $value ) = explode ( ': ', $line );
			$header_array[$key] = $value;
		}
	}
	//Form Return Structure
	$ret = Array("headers" => $header_array, "body" => $body );
	return $ret;
}
$page = _http( "https://potentpages.com", "" );
$headers = $page['headers'];
$http_status_code = $headers['http_code'];
$body = $page['body'];


    echo gettype($body);
    echo '<pre>'; var_dump($body); echo '</pre>';
    echo '<pre>'; print_r($body); echo '</pre>';


foreach($body as $b => $b_value)
{
  if( is_string($b_values) ): 
    echo "Key=" . $b . ", Value=" . $b_value;
  else:
    // added pre to add line feeds
    echo '<pre>'; print_r($b_value); echo '</pre>';
  endif;
  echo "<br>";
}

echo "Headers:"; ?> <br> <?php 
print_r($headers) ?> <br> <?php 
echo "BODY:"; ?> <br> <?php 
print_r($body)

?>

Actually, I've messed-up the code. If you do not mind. Being the Good Samaritan as you are, do you mind going through my scode and weed-out all the chaff and then show me your final result ? That way, I can get learning asap (AS soon As Possible) ?
Remember, I want cURL to fetch my chosen page, then dump data like so onto my mysql db:

url|header|body|page_plain_text_content

The "body" column contains the source code of the page.
The "page_plain_text_content" just contains the text conent that you would usually show a visitor. That means no html tags, no xml tags, no css tags, etc. Just plain text. 
Now, do not mind but update your page to show in the formats I just mentioned:
http://johns-jokes.com/downloads/sp-d/johnyboy-curl-test/?url=https://potentpages.com

Anyway, I will be rolling off to sleep too soon and so I will check back your reply tomorrow.
Thanks for bending over backwards trying to help me. And don't forget this thread and post.

Cheers!

m_hutley · September 10, 2019, 7:44pm

As i said back in post 12, $body is not an array. Stop treating it as one. It’s a string. Output it as you would any other string.

superappsbuilder · September 10, 2019, 7:56pm

Hi,

This is working fine now …

<?php 

//ERROR REPORTING CODES. 
declare(strict_types=1); 
ini_set('display_errors', '1'); 
ini_set('display_startup_errors', '1'); 
error_reporting(E_ALL); 
mysqli_report(MYSQLI_REPORT_ERROR | MYSQLI_REPORT_STRICT); 

/*
Download a Webpage via the HTTP GET Protocol using libcurl
*/
function _http ( $target, $referer ) {
	//Initialize Handle
	$handle = curl_init();
	//Define Settings
	curl_setopt ( $handle, CURLOPT_HTTPGET, true );
	curl_setopt ( $handle, CURLOPT_HEADER, true );
	curl_setopt ( $handle, CURLOPT_COOKIEJAR, "cookie_jar.txt" );
	curl_setopt ( $handle, CURLOPT_COOKIEFILE, "cookies.txt" );
	curl_setopt ( $handle, CURLOPT_USERAGENT, "web-crawler-tutorial-test" );
	curl_setopt ( $handle, CURLOPT_URL, $target );
	curl_setopt ( $handle, CURLOPT_REFERER, $referer );
	curl_setopt ( $handle, CURLOPT_FOLLOWLOCATION, true );
	curl_setopt ( $handle, CURLOPT_MAXREDIRS, 4 );
	curl_setopt ( $handle, CURLOPT_RETURNTRANSFER, true );
	//Execute Request
	$output = curl_exec ( $handle );
	//Close cURL handle
	curl_close ( $handle );
	//Separate Header and Body
	$separator = "\r\n\r\n";
	$header = substr( $output, 0, strpos( $output, $separator ) );
	$body_start = strlen( $header ) + strlen( $separator );
	$body = substr( $output, $body_start, strlen( $output ) - $body_start );
	//Parse Headers
	$header_array = Array();
	foreach ( explode ( "\r\n", $header ) as $i => $line ) {
		if($i === 0) {
			$header_array['http_code'] = $line;
			$status_info = explode( " ", $line );
			$header_array['status_info'] = $status_info;
		} else {
			list ( $key, $value ) = explode ( ': ', $line );
			$header_array[$key] = $value;
		}
	}
	//Form Return Structure
	$ret = Array("headers" => $header_array, "body" => $body );
	return $ret;
}
$page = _http( "https://potentpages.com", "" );
$headers = $page['headers'];
$http_status_code = $headers['http_code'];
$body = $page['body'];

echo "$body"; 
//$source_code = htmlspecialchars($body); 
//echo "$source_code"; 
echo "<pre>".htmlspecialchars($body)."</pre>"; //Nog Dog's suggestion: https://www.webdeveloper.com/d/385760-how-to-display-on-your-screen-browser-the-curl-fetched-page-s-source-code/3

?>

Anyway, how to echo the fetched page’s source code inside a blocktext box like this page:
http://johns-jokes.com/downloads/sp-d/johnyboy-curl-test/?url=https://potentpages.com

m_hutley · September 10, 2019, 8:01pm

That would be the <pre> tag.

superappsbuilder · September 10, 2019, 8:09pm

I just learning about the “pre” tag. Just realised the html codes is being shown in a blocktext box but the blocktext box is so wide that I can hardly see the borders. How to make it the size as John’s ?
http://johns-jokes.com/downloads/sp-d/johnyboy-curl-test/?url=https://potentpages.com

Care to show me a sample line ?

John_Betong · September 11, 2019, 12:09am

Click the “home” tab and you will see the source code used to display the data.

John_Betong · September 11, 2019, 12:12am

Check the source code and now the CSS used to wrap the text which normally extends outside the pre box.

John_Betong · September 11, 2019, 2:05am

Rough Weather, a Spencer, American private detective novel by Robert B Parker.

John_Betong · September 11, 2019, 2:07am

Try the PHP functions, striptags, htmlspecialcharacters, etc

system · December 11, 2019, 9:08am

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
cURL not working while fetching Google API data PHP	2	4365	September 15, 2016
CURL returns blank page PHP	10	21959	October 8, 2014
Curl not working PHP	11	27774	October 8, 2014
Curl problem PHP	3	507	February 24, 2011
Curl issue PHP	2	450	October 8, 2014

cURL Failing To Download Page

Related topics