SitePoint Sponsor

User Tag List

Results 1 to 6 of 6
  1. #1
    SitePoint Enthusiast
    Join Date
    Mar 2003
    Location
    Brisbane, Australia
    Posts
    48
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Processing search engine results

    Is it possible to retrieve search engine results using PHP? For example, just say I want to find the total number of search results at Altavista or Google for a list of search terms, and display them in a table? Am I off my head or is that kind of thing possible? (I couldn't find anything remotely close to it in past threads)

  2. #2
    SitePoint Evangelist
    Join Date
    Feb 2004
    Location
    Sofia, Bulgaria
    Posts
    421
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    yes, i think this could be done with PHP.. here are some starting points what you have to do:
    1. examine the search queries of the search engines you want to use.
    2. examine the way the search engine return the results.. look through the HTML code to find unique start/end pairs for grabbing..
    3. find a class for grabbing and parsing and configure it according the rules you discovered in points 1. and 2.
    4. get the results from the class and display them the way you want..

  3. #3
    SitePoint Enthusiast kaklz's Avatar
    Join Date
    Mar 2004
    Location
    Latvia, Riga
    Posts
    37
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by johncook
    Is it possible to retrieve search engine results using PHP? For example, just say I want to find the total number of search results at Altavista or Google for a list of search terms, and display them in a table? Am I off my head or is that kind of thing possible? (I couldn't find anything remotely close to it in past threads)
    You can take a look at the Google API homepage.

  4. #4
    SitePoint Enthusiast
    Join Date
    Mar 2003
    Location
    Brisbane, Australia
    Posts
    48
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Almost there but not quite

    I'm kinda there but it sometimes works, sometimes doesn't. I created a page googleresults.php that takes a keyphrase, inserts it into the google results page URL and looks at the output. It then does a preg_match trying to find the number of searches in the sentence "Results 1 - 10 of about 12,100,000 for..." The weird thing is sometimes it works perfect, returning just the number but sometimes it returns a huge portion of the webpage. So I'm not sure what's going on there. If any more seasoned hands can see any weaknesses in my code, I'm all ears
    Code:
    <?php
    function GoogleCount($KeyPhrase)
    	{
    	$KeyPhrase = str_replace(" ","+",$KeyPhrase);
    	$url = "http://www.google.com/search?sourceid=navclient&ie=UTF-8&oe=UTF-8&q=$KeyPhrase";
    	$fp = @fopen($url,"r"); 
    	$content = fread($fp,100000);
    	$title = preg_match('/of about(.*)for/',$content,$matchesarray);
    	return $matchesarray[1];
      	}
    	
    if (!isset($KeyPhrase))
    	{
    	echo "<form name='searchform' method='post' action='googleresults.php'>\r\n";
    	echo "<input type='text' name='KeyPhrase'>\r\n";
    	echo "<input type='submit' name='Submit' value='Submit'>\r\n";
    	echo "</form>\r\n";
    	}
    else
    	{
    	echo "Google finds ".GoogleCount($KeyPhrase)." matches for '$KeyPhrase'";
    	}
    ?>

  5. #5
    SitePoint Evangelist
    Join Date
    Feb 2004
    Location
    Sofia, Bulgaria
    Posts
    421
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    try this..
    PHP Code:
    $title preg_match('/of about (.*?) for/',$content,$matchesarray); 

  6. #6
    SitePoint Enthusiast
    Join Date
    Mar 2003
    Location
    Brisbane, Australia
    Posts
    48
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Better, getting better...

    Quote Originally Posted by dacool
    try this..
    PHP Code:
    $title preg_match('/of about (.*?) for/',$content,$matchesarray); 
    Improvement. Now all the hits I get are either just the number (eg - 3,040,304) or no match at all. Not sure why it'd be giving me no match in some cases (about 2 out of every 10 attempts).

    Incidentally, I strip the comma'd number down using a regex that strips any non-numerical character: preg_replace("/[^0-9]/","",$ResultFromGoogle)

    Seemed the easiest way to do it. Thanx for your help! :-)


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •