Hi Guys,
I have been working on this for a few days and am getting no where fast lol, from this site query:
http://paydotcom.com/marketplace.php?category=0&subcategory=0&search=tv&Submit=Search
I’m trying to parse out a few bits of information, nameley the product name and description
<?php
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$results = $xpath->query("//*[@class='acat']");
//$results = $xpath->getElementsByTagName('a');
///html/body/div[3]/center/table/tbody/tr/td/table/tbody/tr/td[2]/div[2]/table/tbody/tr/td
///html/body/div[3]/center/table/tbody/tr/td/table/tbody/tr/td[2]/div[2]/table[2]/tbody/tr/td
///html/body/div[3]/center/table/tbody/tr/td/table/tbody/tr/td[2]/div[2]/table[3]/tbody/tr/td
//$results = $xpath->query("/html/body/div/center/table/tr/td/table/tr/td/div[@id='content']/table/tr/td");
foreach ($results as $result) {
// Title
$title = $result->nodeValue;
print $title;
}
The way the code is now it parses the product title, but i really need top parse out the description aswell i have used firebug to find the proper xpath but further reading on firebug, other people say it gives inaccurate results and adds a tbody tag.
if anyone could point me in the right direction that would be great i’m not normally stuck this long on projects but this has me snookered.
thanks guys
Graham
I suggest using a tool like SimpleHTMLDOM to parse the HTML. It handles broken HTML pretty easily, and with it I’ve put together a small script that dumps the names and prices of the items into a two-dimensional array.
<?php
include('simplehtmldom/simple_html_dom.php');
$url = 'http://paydotcom.com/marketplace.php?category=0&subcategory=0&search=tv&Submit=Search';
$html = file_get_html($url);
$results = $html->find('div[id=content]', 0)->find('table');
foreach ($results as $result) {
$row = $result->find('td[class=acat]', 0);
list($name, $price) = explode('$', $row->plaintext);
$items[] = array('name'=>$name, 'price'=>$price);
}
print '<pre>';
print_r($items);
print '</pre>';
You can get SimpleHTMLDOM here: http://sourceforge.net/projects/simplehtmldom/
I’ve ever used SimpleHTMLDOM but it usually reach PHP’s memory limitation, I’d like to recommend phpQuery. It’s a wrapper of DOMDocument, I think it’s a bit better over SimpleHTMLDOM
Hi Guys,
Thanks for that mate, i like the look of the simple SimpleHTMLDOM, i managed to get the second bit of information i needed using $row = $result->find(‘td[class=subtitle_s]’, 0); and even the 3rd part.
code:
<?php
include('simplehtmldom/simple_html_dom.php');
$url = 'http://paydotcom.com/marketplace.php?category=0&subcategory=0&search=tv&Submit=Search';
$html = file_get_html($url);
$results = $html->find('div[id=content]', 0)->find('table');
foreach ($results as $result) {
// Title
$row1 = $result->find('td[class=acat]', 0);
// Description
$row2 = $result->find('td[class=subtitle_s]', 0);
// URL
$row3 = $result->find('a', 1);
list($name, $price) = explode('$', $row1->plaintext);
$items[] = array('name'=>$name, 'price'=>$price);
}
print '<pre>';
print_r($items);
print '</pre>';
This brings back the 3rd part i need great but it bring it back as a hyper link: <a href=“http://paydotcom.com/r/79808/XXXXX/”>Visit Site</a>
is there a way i can fish out the url: http://paydotcom.com/r/79808/XXXXX/ at all?
thanks mate
Graham
There absolutely is.
Use the getAttribute() method on the object holding the anchor tag, like so:
// URL
$row3 = $result->find('a', 1)->getAttribute('href');
I’d recommend you check out the manual that comes with SimpleHTMLDOM, it provides great documentation on the package.
Thanks a lot mate i have been studying the guides it makes dom a breeze 
cheers
Graham