cURL Failing To Download Page

I know what htmlspecialchars, striptags, htmlentities are. I just did not understand the codes you were showing in the text boxes.
http://johns-jokes.com/downloads/sp-d/johnyboy-curl-test/?url=https://potentpages.com

I’m guessing now that, on the 1st text box, you were showing the fetched page’s whole source code (html) after ridding the htmlspecialchars and on the 2nd you were you were showing the fetched page’s whole source code (html) after ridding the html tags (striptags) and on the 3rd text box you were showing the fetched page’s whole source code (html) after ridding the htmlentities.
Is that what you were doing ? If so, why were you doing all that ? Tell me that as I may learn something here.

Btw, what book are you reading ? Lord of The Rings ? Hobbit ? Dragonlance Chronicles ? Lol!

Ok. Reply to this post tomorrow then …

I did as you asked. Look:

1. `echo gettype($body);`
2. `echo '<pre>'; var_dump($body); echo '</pre>';`
3. `echo '<pre>'; print_r($body); echo '</pre>';`
<?php 

//ERROR REPORTING CODES. 
declare(strict_types=1); 
ini_set('display_errors', '1'); 
ini_set('display_startup_errors', '1'); 
error_reporting(E_ALL); 
mysqli_report(MYSQLI_REPORT_ERROR | MYSQLI_REPORT_STRICT); 

/*
Download a Webpage via the HTTP GET Protocol using libcurl
*/
function _http ( $target, $referer ) {
	//Initialize Handle
	$handle = curl_init();
	//Define Settings
	curl_setopt ( $handle, CURLOPT_HTTPGET, true );
	curl_setopt ( $handle, CURLOPT_HEADER, true );
	curl_setopt ( $handle, CURLOPT_COOKIEJAR, "cookie_jar.txt" );
	curl_setopt ( $handle, CURLOPT_COOKIEFILE, "cookies.txt" );
	curl_setopt ( $handle, CURLOPT_USERAGENT, "web-crawler-tutorial-test" );
	curl_setopt ( $handle, CURLOPT_URL, $target );
	curl_setopt ( $handle, CURLOPT_REFERER, $referer );
	curl_setopt ( $handle, CURLOPT_FOLLOWLOCATION, true );
	curl_setopt ( $handle, CURLOPT_MAXREDIRS, 4 );
	curl_setopt ( $handle, CURLOPT_RETURNTRANSFER, true );
	//Execute Request
	$output = curl_exec ( $handle );
	//Close cURL handle
	curl_close ( $handle );
	//Separate Header and Body
	$separator = "\r\n\r\n";
	$header = substr( $output, 0, strpos( $output, $separator ) );
	$body_start = strlen( $header ) + strlen( $separator );
	$body = substr( $output, $body_start, strlen( $output ) - $body_start );
	//Parse Headers
	$header_array = Array();
	foreach ( explode ( "\r\n", $header ) as $i => $line ) {
		if($i === 0) {
			$header_array['http_code'] = $line;
			$status_info = explode( " ", $line );
			$header_array['status_info'] = $status_info;
		} else {
			list ( $key, $value ) = explode ( ': ', $line );
			$header_array[$key] = $value;
		}
	}
	//Form Return Structure
	$ret = Array("headers" => $header_array, "body" => $body );
	return $ret;
}
$page = _http( "https://potentpages.com", "" );
$headers = $page['headers'];
$http_status_code = $headers['http_code'];
$body = $page['body'];


    echo gettype($body);
    echo '<pre>'; var_dump($body); echo '</pre>';
    echo '<pre>'; print_r($body); echo '</pre>';


foreach($body as $b => $b_value)
{
  if( is_string($b_values) ): 
    echo "Key=" . $b . ", Value=" . $b_value;
  else:
    // added pre to add line feeds
    echo '<pre>'; print_r($b_value); echo '</pre>';
  endif;
  echo "<br>";
}

echo "Headers:"; ?> <br> <?php 
print_r($headers) ?> <br> <?php 
echo "BODY:"; ?> <br> <?php 
print_r($body)

?>

My browser just shows this:
string

string(133615) "

So, what do I do now ?

Here’s the full code:

<?php 

//ERROR REPORTING CODES. 
declare(strict_types=1); 
ini_set('display_errors', '1'); 
ini_set('display_startup_errors', '1'); 
error_reporting(E_ALL); 
mysqli_report(MYSQLI_REPORT_ERROR | MYSQLI_REPORT_STRICT); 

/*
Download a Webpage via the HTTP GET Protocol using libcurl
*/
function _http ( $target, $referer ) {
	//Initialize Handle
	$handle = curl_init();
	//Define Settings
	curl_setopt ( $handle, CURLOPT_HTTPGET, true );
	curl_setopt ( $handle, CURLOPT_HEADER, true );
	curl_setopt ( $handle, CURLOPT_COOKIEJAR, "cookie_jar.txt" );
	curl_setopt ( $handle, CURLOPT_COOKIEFILE, "cookies.txt" );
	curl_setopt ( $handle, CURLOPT_USERAGENT, "web-crawler-tutorial-test" );
	curl_setopt ( $handle, CURLOPT_URL, $target );
	curl_setopt ( $handle, CURLOPT_REFERER, $referer );
	curl_setopt ( $handle, CURLOPT_FOLLOWLOCATION, true );
	curl_setopt ( $handle, CURLOPT_MAXREDIRS, 4 );
	curl_setopt ( $handle, CURLOPT_RETURNTRANSFER, true );
	//Execute Request
	$output = curl_exec ( $handle );
	//Close cURL handle
	curl_close ( $handle );
	//Separate Header and Body
	$separator = "\r\n\r\n";
	$header = substr( $output, 0, strpos( $output, $separator ) );
	$body_start = strlen( $header ) + strlen( $separator );
	$body = substr( $output, $body_start, strlen( $output ) - $body_start );
	//Parse Headers
	$header_array = Array();
	foreach ( explode ( "\r\n", $header ) as $i => $line ) {
		if($i === 0) {
			$header_array['http_code'] = $line;
			$status_info = explode( " ", $line );
			$header_array['status_info'] = $status_info;
		} else {
			list ( $key, $value ) = explode ( ': ', $line );
			$header_array[$key] = $value;
		}
	}
	//Form Return Structure
	$ret = Array("headers" => $header_array, "body" => $body );
	return $ret;
}
$page = _http( "https://potentpages.com", "" );
$headers = $page['headers'];
$http_status_code = $headers['http_code'];
$body = $page['body'];


    echo gettype($body);
    echo '<pre>'; var_dump($body); echo '</pre>';
    echo '<pre>'; print_r($body); echo '</pre>';


foreach($body as $b => $b_value)
{
  if( is_string($b_values) ): 
    echo "Key=" . $b . ", Value=" . $b_value;
  else:
    // added pre to add line feeds
    echo '<pre>'; print_r($b_value); echo '</pre>';
  endif;
  echo "<br>";
}

echo "Headers:"; ?> <br> <?php 
print_r($headers) ?> <br> <?php 
echo "BODY:"; ?> <br> <?php 
print_r($body)

?>

Actually, I've messed-up the code. If you do not mind. Being the Good Samaritan as you are, do you mind going through my scode and weed-out all the chaff and then show me your final result ? That way, I can get learning asap (AS soon As Possible) ?
Remember, I want cURL to fetch my chosen page, then dump data like so onto my mysql db:

url|header|body|page_plain_text_content

The "body" column contains the source code of the page.
The "page_plain_text_content" just contains the text conent that you would usually show a visitor. That means no html tags, no xml tags, no css tags, etc. Just plain text. 
Now, do not mind but update your page to show in the formats I just mentioned:
http://johns-jokes.com/downloads/sp-d/johnyboy-curl-test/?url=https://potentpages.com

Anyway, I will be rolling off to sleep too soon and so I will check back your reply tomorrow.
Thanks for bending over backwards trying to help me. And don't forget this thread and post.

Cheers!
1 Like

As i said back in post 12, $body is not an array. Stop treating it as one. It’s a string. Output it as you would any other string.

1 Like

Hi,

This is working fine now …

<?php 

//ERROR REPORTING CODES. 
declare(strict_types=1); 
ini_set('display_errors', '1'); 
ini_set('display_startup_errors', '1'); 
error_reporting(E_ALL); 
mysqli_report(MYSQLI_REPORT_ERROR | MYSQLI_REPORT_STRICT); 

/*
Download a Webpage via the HTTP GET Protocol using libcurl
*/
function _http ( $target, $referer ) {
	//Initialize Handle
	$handle = curl_init();
	//Define Settings
	curl_setopt ( $handle, CURLOPT_HTTPGET, true );
	curl_setopt ( $handle, CURLOPT_HEADER, true );
	curl_setopt ( $handle, CURLOPT_COOKIEJAR, "cookie_jar.txt" );
	curl_setopt ( $handle, CURLOPT_COOKIEFILE, "cookies.txt" );
	curl_setopt ( $handle, CURLOPT_USERAGENT, "web-crawler-tutorial-test" );
	curl_setopt ( $handle, CURLOPT_URL, $target );
	curl_setopt ( $handle, CURLOPT_REFERER, $referer );
	curl_setopt ( $handle, CURLOPT_FOLLOWLOCATION, true );
	curl_setopt ( $handle, CURLOPT_MAXREDIRS, 4 );
	curl_setopt ( $handle, CURLOPT_RETURNTRANSFER, true );
	//Execute Request
	$output = curl_exec ( $handle );
	//Close cURL handle
	curl_close ( $handle );
	//Separate Header and Body
	$separator = "\r\n\r\n";
	$header = substr( $output, 0, strpos( $output, $separator ) );
	$body_start = strlen( $header ) + strlen( $separator );
	$body = substr( $output, $body_start, strlen( $output ) - $body_start );
	//Parse Headers
	$header_array = Array();
	foreach ( explode ( "\r\n", $header ) as $i => $line ) {
		if($i === 0) {
			$header_array['http_code'] = $line;
			$status_info = explode( " ", $line );
			$header_array['status_info'] = $status_info;
		} else {
			list ( $key, $value ) = explode ( ': ', $line );
			$header_array[$key] = $value;
		}
	}
	//Form Return Structure
	$ret = Array("headers" => $header_array, "body" => $body );
	return $ret;
}
$page = _http( "https://potentpages.com", "" );
$headers = $page['headers'];
$http_status_code = $headers['http_code'];
$body = $page['body'];

echo "$body"; 
//$source_code = htmlspecialchars($body); 
//echo "$source_code"; 
echo "<pre>".htmlspecialchars($body)."</pre>"; //Nog Dog's suggestion: https://www.webdeveloper.com/d/385760-how-to-display-on-your-screen-browser-the-curl-fetched-page-s-source-code/3

?>

Anyway, how to echo the fetched page’s source code inside a blocktext box like this page:
http://johns-jokes.com/downloads/sp-d/johnyboy-curl-test/?url=https://potentpages.com

That would be the <pre> tag.

1 Like

I just learning about the “pre” tag. Just realised the html codes is being shown in a blocktext box but the blocktext box is so wide that I can hardly see the borders. How to make it the size as John’s ?
http://johns-jokes.com/downloads/sp-d/johnyboy-curl-test/?url=https://potentpages.com

Care to show me a sample line ?

Click the “home” tab and you will see the source code used to display the data.

1 Like

Check the source code and now the CSS used to wrap the text which normally extends outside the pre box.

Rough Weather, a Spencer, American private detective novel by Robert B Parker.

Try the PHP functions, striptags, htmlspecialcharacters, etc

1 Like