Work with html dom for crawl websites

labour · May 19, 2016, 11:30am

i have this file to crawl sites,its work with html dom

i have some problem look at below examples;


<?php
include_once('simple_html_dom.php');
$target_url = "http://php.net/";
$html = new simple_html_dom();
$html->load_file($target_url);
foreach($html->find('img') as $link){
echo $link->src."<br />";

}
?>

its crawl and return all src of images that exist in php.net page,its clear for me that return attributes of tags…

but what about if i need to get some text in page ,see below code


<section id="item-details">
    <h1>Bike delivery</h1>

    <p>
        <time datetime="2016-05-19 15:22:55" class="small-text icon-clock">
            25 minutes ago        </time>
        <span class="small-text">
          Address: new york   </span>
                    <span class="item-price"><strong>200000</strong> Dollar</span>
            </p>

    
    <p>hi its our service<br />
you can trust us<br />
we are the best<br />
follow us<br />

</section>

i wana extract
“hi its our service
you can trust us
we are the best
follow us” As content
then “200000 Dollar " as price; and " Address: new york” as address and " 25 minutes ago
" as time …and Bike delivery as title… and save to my data base

how i can do it with html-dom (file available for download in up with example)
or some thing else?

oddz · May 19, 2016, 5:49pm

Use the well known and supported Symfony DOMCrawler. Don’t use some one off library yanked off a script site with no documentation or support. The Symfony Dom Crawler can also be used with Css Selector which makes it easy to crawl a page using css selector syntax.

labour · May 20, 2016, 7:59am

thanks MR @oddz

Can i have any account of you?
you are too professional…
i need you for Laravel and many question

oddz · May 21, 2016, 1:19am

It’s best to post your questions on site point. If it’s something I can provide info on ill answer.

system · August 20, 2016, 8:19am

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.