SitePoint Sponsor

User Tag List

Results 1 to 3 of 3

Thread: web crawler PHP

  1. #1
    SitePoint Enthusiast
    Join Date
    Oct 2005
    Posts
    74
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    web crawler PHP

    Hello,
    I am supposed to construct a page that searches in specific websites to extract information, like those sites from where you can rent a car for example. There is a form in the site where the user selects some fields (for instance departure and drop-off date), then the data are submitted to the other page that searches 2-3 sites and finds which cars are available on those dates.
    I wanted to ask if there are ready scripts to do that, if not, some hints on how to start.
    I am familiar with PHP forms and data extraction from mysql databases, but when you extract data from other sites, I have no clue how I can begin and deal with it...

  2. #2
    SitePoint Addict
    Join Date
    Sep 2006
    Posts
    219
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    You have to be careful if these sites do not ALLOW this - I'm not sure of the legalities..

    You could go about it by learning the layout of these sites and reading and parsing the page based on the layouts.. you will become unstuck though every time one of these sites changes their layout.

  3. #3
    SitePoint Wizard silver trophy
    Join Date
    Mar 2006
    Posts
    6,132
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    you can read the html from a url into a string with file_get_contents()

    then you can extract whatever data you want from it. most likely you would want to use a regular expression for this, such as preg_match() or preg_match_all()

    btw thi is commonly called secreen/web scraping

    you may want to look at the snoppy class.
    http://sourceforge.net/projects/snoopy/


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •