xPath html file

Hi Guys,

What i’m trying to do is read an html file into a domDocument then use xPath to retrieve the relevant fields.

so far i have:


<?php
if (isset($_GET['searchDeep']))
{

  // Deep search code
	//$searchString = str_replace( " ","+",$searchString);
	$search_url   = "http://www.clickbank.com/mkplSearchResult.htm?dores=true&includeKeywords=$searchString&firstResult=1";
  print $search_url; print "<br />";
	// make the cURL request to $search_url
	$ch = curl_init();
	curl_setopt($ch, CURLOPT_USERAGENT, 'Firefox (WindowsXP) - Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6');
	curl_setopt($ch, CURLOPT_URL,$search_url);
	curl_setopt($ch, CURLOPT_FAILONERROR, true);
	curl_setopt($ch, CURLOPT_AUTOREFERER, true);
	curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
	curl_setopt($ch, CURLOPT_TIMEOUT, 30);
	$html = curl_exec($ch);
	if (!$html) {
		echo "<p class=\\"fcs-message-error\\">cURL error:" . curl_error($ch) . " (Error Number " . curl_errno($ch).")</p>";
	}
	curl_close($ch); 

        // parse the html into a DOMDocument  
        $dom = new DOMDocument();
        $dom->loadHTML($html);
        
        //print_r($dom)
				$xpath = new DOMXPath($dom);	
        //print_r($xpath);
				// Loop
				foreach ($xpath as $item)
				{
						//print_r($item);
            print $item->query("//div[@id='results']//tr/td[@class='details']/h4/a");
            
            // URLs
            $cbURL = $item->getAttribute('href');
            
            // Replace with my hoplink
            $cbURL = str_replace("zzzzz", "graham25s", $cbURL);
				    print $cbURL;
				    
            $xpath = new DOMXPath($dom);
            $paras = $xpath->query("//div[@id='results']//td[@class='details']//div[@class='description']");
            $para = $paras->item(0);
            $description = $para->textContent;	
              
            $xpath = new DOMXPath($dom);			
            $paras = $xpath->query("//div[@id='results']//td[@class='details']//h4/a");
            $para  = $paras->item(0);
            $title = $para->textContent;					
            
            $link = '<a rel="nofollow" href="'.$cbURL.'">'.$title.'</a>';					
            
            print "<br/><strong>".$link."</strong><br/>".$description;
            //print $link;
				
				}
							
}
?>

I have pieced this together from my limited knowledge :slight_smile: i can’t seem to loop the results returned back.

any help would be appreciated

thanks guys

Graham

The code could be tidied up somewhat as there are a number of things happening that don’t really need to be. A basic example would be to change everything below your //print_r($dom) line with something like (Notes: this will not output anything, you know how to print HTML. Also no santity checking is used, the HTML is assumed to have the right structure):


$xpath = new DOMXPath($dom);    
$results = $xpath->query("//div[@id='results']//tr[@class='result']/td[1]");
foreach ($results as $result)
{
	$anode  = $result->getElementsByTagName("a")->item(0);
	$title  = $anode->textContent;
	$hopurl = str_replace("zzzzz", "graham25s", $anode->getAttribute('href'));
	$desc   = $result->getElementsByTagName("div")->item(0)->textContent;

	// Write your HTML
}

Thanks very much mate worked great :slight_smile:

Graham