Php crawler

labour · March 14, 2016, 7:02pm

hi can you introduce php crawler framework or class,

i found some thing like php crawler framework,but its not clear how to work with this,

Gandalf · March 14, 2016, 7:04pm

What is it you want to do?

labour · March 14, 2016, 7:06pm

i want crawl websites…and save their posts, texts, images some thing like RSS… i wana read and save to data base

tosta · March 14, 2016, 9:41pm

first thing you should to is to get the website content with PHP, so use this:

function get_web_page( $url )
{
    $options = array(
        CURLOPT_RETURNTRANSFER => true,     // return web page
        CURLOPT_HEADER         => false,    // don't return headers
        CURLOPT_FOLLOWLOCATION => true,     // follow redirects
        CURLOPT_ENCODING       => "",       // handle all encodings
        CURLOPT_USERAGENT      => "GOOGLEboy; (+http://www.google.com/)", // who am i
        CURLOPT_AUTOREFERER    => true,     // set referer on redirect
        CURLOPT_CONNECTTIMEOUT => 120,      // timeout on connect
        CURLOPT_TIMEOUT        => 120,      // timeout on response
        CURLOPT_MAXREDIRS      => 10,       // stop after 10 redirects
    );

    $ch      = curl_init( $url );
    curl_setopt_array( $ch, $options );
    $content = curl_exec( $ch );
    $err     = curl_errno( $ch );
    $errmsg  = curl_error( $ch );
    $header  = curl_getinfo( $ch );
    curl_close( $ch );

    $header['errno']   = $err;
    $header['errmsg']  = $errmsg;
    $header['content'] = $content;
    return $header;
}
$result=get_web_page($url);
$cont=$result['content']; /// got your website content

////to get specific part text from a page use:

preg_match_all('/<span class="ssss">(.*?)<\/span>/s', $cont, $youwantedthis);

///(.*?) - this is what you want to have

I hope this will help you to get what you want

labour · March 14, 2016, 9:43pm

thanks,illtry some,

oddz · March 17, 2016, 12:02am

I would recommend Guzzle http client with the Symfony DomCrawler. Using low level curl code in this day and age is for the birds.

system · June 16, 2016, 7:02am

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Building Your Own Web Crawler Tutorial PHP	0	1143	August 6, 2017
Need Help with making my own crawler in PHP PHP	17	7382	July 21, 2014
I can't scrape a website PHP	9	2189	July 2, 2019
Get content of website PHP	0	671	June 7, 2014
Using PHP cURL to fix an article PHP	6	1206	November 26, 2018

Php crawler

Related topics