Php crawler

hi can you introduce php crawler framework or class,

i found some thing like php crawler framework,but its not clear how to work with this,

What is it you want to do?

1 Like

i want crawl websites…and save their posts, texts, images some thing like RSS… i wana read and save to data base

first thing you should to is to get the website content with PHP, so use this:

function get_web_page( $url )
{
    $options = array(
        CURLOPT_RETURNTRANSFER => true,     // return web page
        CURLOPT_HEADER         => false,    // don't return headers
        CURLOPT_FOLLOWLOCATION => true,     // follow redirects
        CURLOPT_ENCODING       => "",       // handle all encodings
        CURLOPT_USERAGENT      => "GOOGLEboy; (+http://www.google.com/)", // who am i
        CURLOPT_AUTOREFERER    => true,     // set referer on redirect
        CURLOPT_CONNECTTIMEOUT => 120,      // timeout on connect
        CURLOPT_TIMEOUT        => 120,      // timeout on response
        CURLOPT_MAXREDIRS      => 10,       // stop after 10 redirects
    );

    $ch      = curl_init( $url );
    curl_setopt_array( $ch, $options );
    $content = curl_exec( $ch );
    $err     = curl_errno( $ch );
    $errmsg  = curl_error( $ch );
    $header  = curl_getinfo( $ch );
    curl_close( $ch );

    $header['errno']   = $err;
    $header['errmsg']  = $errmsg;
    $header['content'] = $content;
    return $header;
}
$result=get_web_page($url);
$cont=$result['content']; /// got your website content

////to get specific part text from a page use:

preg_match_all('/<span class="ssss">(.*?)<\/span>/s', $cont, $youwantedthis);

///(.*?) - this is what you want to have

I hope this will help you to get what you want

1 Like

thanks,illtry some,

I would recommend Guzzle http client with the Symfony DomCrawler. Using low level curl code in this day and age is for the birds.

1 Like

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.