Ok. Reply to this post tomorrow then …
I did as you asked. Look:
1. `echo gettype($body);`
2. `echo '<pre>'; var_dump($body); echo '</pre>';`
3. `echo '<pre>'; print_r($body); echo '</pre>';`
<?php
//ERROR REPORTING CODES.
declare(strict_types=1);
ini_set('display_errors', '1');
ini_set('display_startup_errors', '1');
error_reporting(E_ALL);
mysqli_report(MYSQLI_REPORT_ERROR | MYSQLI_REPORT_STRICT);
/*
Download a Webpage via the HTTP GET Protocol using libcurl
*/
function _http ( $target, $referer ) {
//Initialize Handle
$handle = curl_init();
//Define Settings
curl_setopt ( $handle, CURLOPT_HTTPGET, true );
curl_setopt ( $handle, CURLOPT_HEADER, true );
curl_setopt ( $handle, CURLOPT_COOKIEJAR, "cookie_jar.txt" );
curl_setopt ( $handle, CURLOPT_COOKIEFILE, "cookies.txt" );
curl_setopt ( $handle, CURLOPT_USERAGENT, "web-crawler-tutorial-test" );
curl_setopt ( $handle, CURLOPT_URL, $target );
curl_setopt ( $handle, CURLOPT_REFERER, $referer );
curl_setopt ( $handle, CURLOPT_FOLLOWLOCATION, true );
curl_setopt ( $handle, CURLOPT_MAXREDIRS, 4 );
curl_setopt ( $handle, CURLOPT_RETURNTRANSFER, true );
//Execute Request
$output = curl_exec ( $handle );
//Close cURL handle
curl_close ( $handle );
//Separate Header and Body
$separator = "\r\n\r\n";
$header = substr( $output, 0, strpos( $output, $separator ) );
$body_start = strlen( $header ) + strlen( $separator );
$body = substr( $output, $body_start, strlen( $output ) - $body_start );
//Parse Headers
$header_array = Array();
foreach ( explode ( "\r\n", $header ) as $i => $line ) {
if($i === 0) {
$header_array['http_code'] = $line;
$status_info = explode( " ", $line );
$header_array['status_info'] = $status_info;
} else {
list ( $key, $value ) = explode ( ': ', $line );
$header_array[$key] = $value;
}
}
//Form Return Structure
$ret = Array("headers" => $header_array, "body" => $body );
return $ret;
}
$page = _http( "https://potentpages.com", "" );
$headers = $page['headers'];
$http_status_code = $headers['http_code'];
$body = $page['body'];
echo gettype($body);
echo '<pre>'; var_dump($body); echo '</pre>';
echo '<pre>'; print_r($body); echo '</pre>';
foreach($body as $b => $b_value)
{
if( is_string($b_values) ):
echo "Key=" . $b . ", Value=" . $b_value;
else:
// added pre to add line feeds
echo '<pre>'; print_r($b_value); echo '</pre>';
endif;
echo "<br>";
}
echo "Headers:"; ?> <br> <?php
print_r($headers) ?> <br> <?php
echo "BODY:"; ?> <br> <?php
print_r($body)
?>
My browser just shows this:
string
string(133615) "
So, what do I do now ?
Here’s the full code:
<?php
//ERROR REPORTING CODES.
declare(strict_types=1);
ini_set('display_errors', '1');
ini_set('display_startup_errors', '1');
error_reporting(E_ALL);
mysqli_report(MYSQLI_REPORT_ERROR | MYSQLI_REPORT_STRICT);
/*
Download a Webpage via the HTTP GET Protocol using libcurl
*/
function _http ( $target, $referer ) {
//Initialize Handle
$handle = curl_init();
//Define Settings
curl_setopt ( $handle, CURLOPT_HTTPGET, true );
curl_setopt ( $handle, CURLOPT_HEADER, true );
curl_setopt ( $handle, CURLOPT_COOKIEJAR, "cookie_jar.txt" );
curl_setopt ( $handle, CURLOPT_COOKIEFILE, "cookies.txt" );
curl_setopt ( $handle, CURLOPT_USERAGENT, "web-crawler-tutorial-test" );
curl_setopt ( $handle, CURLOPT_URL, $target );
curl_setopt ( $handle, CURLOPT_REFERER, $referer );
curl_setopt ( $handle, CURLOPT_FOLLOWLOCATION, true );
curl_setopt ( $handle, CURLOPT_MAXREDIRS, 4 );
curl_setopt ( $handle, CURLOPT_RETURNTRANSFER, true );
//Execute Request
$output = curl_exec ( $handle );
//Close cURL handle
curl_close ( $handle );
//Separate Header and Body
$separator = "\r\n\r\n";
$header = substr( $output, 0, strpos( $output, $separator ) );
$body_start = strlen( $header ) + strlen( $separator );
$body = substr( $output, $body_start, strlen( $output ) - $body_start );
//Parse Headers
$header_array = Array();
foreach ( explode ( "\r\n", $header ) as $i => $line ) {
if($i === 0) {
$header_array['http_code'] = $line;
$status_info = explode( " ", $line );
$header_array['status_info'] = $status_info;
} else {
list ( $key, $value ) = explode ( ': ', $line );
$header_array[$key] = $value;
}
}
//Form Return Structure
$ret = Array("headers" => $header_array, "body" => $body );
return $ret;
}
$page = _http( "https://potentpages.com", "" );
$headers = $page['headers'];
$http_status_code = $headers['http_code'];
$body = $page['body'];
echo gettype($body);
echo '<pre>'; var_dump($body); echo '</pre>';
echo '<pre>'; print_r($body); echo '</pre>';
foreach($body as $b => $b_value)
{
if( is_string($b_values) ):
echo "Key=" . $b . ", Value=" . $b_value;
else:
// added pre to add line feeds
echo '<pre>'; print_r($b_value); echo '</pre>';
endif;
echo "<br>";
}
echo "Headers:"; ?> <br> <?php
print_r($headers) ?> <br> <?php
echo "BODY:"; ?> <br> <?php
print_r($body)
?>
Actually, I've messed-up the code. If you do not mind. Being the Good Samaritan as you are, do you mind going through my scode and weed-out all the chaff and then show me your final result ? That way, I can get learning asap (AS soon As Possible) ?
Remember, I want cURL to fetch my chosen page, then dump data like so onto my mysql db:
url|header|body|page_plain_text_content
The "body" column contains the source code of the page.
The "page_plain_text_content" just contains the text conent that you would usually show a visitor. That means no html tags, no xml tags, no css tags, etc. Just plain text.
Now, do not mind but update your page to show in the formats I just mentioned:
http://johns-jokes.com/downloads/sp-d/johnyboy-curl-test/?url=https://potentpages.com
Anyway, I will be rolling off to sleep too soon and so I will check back your reply tomorrow.
Thanks for bending over backwards trying to help me. And don't forget this thread and post.
Cheers!