Folks,
This is how I would extract links off from a webpage:
<?php
//Assuming your contents are in a variable called $contents
//New DOM Document
$document = new DOMDocument;
//Load HTML in $contents variable
$document->loadHTML($contents);
//Get all links
if($links = $document->getElementsByTagName('a')) {
//Loop through all links
foreach($links as $node) {
//Get link location (href)
$link_href = $node->getAttribute('href');
//Get link text
$link_text = $node->nodeValue;
}
}
?>
This is how I would extract images off from a webpage:
<?php
//Assuming your contents are in a variable called $contents
//New DOM Document
$document = new DOMDocument;
//Load HTML in $contents variable
$document->loadHTML($contents);
//Get all links
if($links = $document->getElementsByTagName('img')) {
//Loop through all links
foreach($links as $node) {
//Get source of the image (src attribute)
$img_src = $node->getAttribute('src');
//Get alt text of the image (alt attribute)
$img_alt = $node->getAttribute('alt');
}
}
?>
This is how I would extract jSON off from a webpage:
<?php
//Assuming your contents are in a vairable called $contents
//Check if the JSON is valid
//Attempt to decode; return true for valid if no errors were found.
//Otherwise return false for an error
function checkIfJSONValid($t) {
json_decode($t);
if(json_last_error() == JSON_ERROR_NONE) {
return true;
}
return false;
}
//Match all JSON and filter for valid JSON contents
$json_matches = Array();
$pattern = '/\{(?:[^{}|(?R))*\}/x';
preg_match_all($pattern, $contents, $json_matches);
$json_valid = array_filter($json_matches, 'checkIfJSONValid');
//Loop through all valid JSON strings
foreach( $json_valid as $json ) {
//Decode JSON
//Second parameter specifies to use an associative array for the decoded JSON data
$data = json_decode($t, true);
//JSON is now in an array in the $data variable
}
?>
But …
Q1. How to extract an email address off from the webpage ?
Care to show me a code sample ?
Q2. How to extract Page Title off from the webpage ?
Care to show me a code sample ?
Q3. How to extract Meta Keywords off from the webpage ?
Care to show me a code sample ?
Q4. How to extract Meta Description off from the webpage ?
Care to show me a code sample ?
You may take my above codes and modify and then paste here for us newbies to learn from. Do you see how similar all my 3 codes look like ? How-about showing me 4 more similar codes to extract the 3 things I just mentioned ?