SitePoint Sponsor

User Tag List

Results 1 to 4 of 4
  1. #1
    3MTA3
    Join Date
    Jul 2003
    Location
    Florida
    Posts
    1,016
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Need to Crawl Entire Site

    I need to find the best way to crawl an entire site and return all the pages that belong to that site. I don't need images, only the pages.

    I am able to type in a url and extract all the links for that page, but I'm not sure how to follow the links on the page so I can extract all the deepr links. How do you get a script to extract all the links from the entire site, assuming the starting point is at the root of the domain, http://www.example.com ?

    Can you do this with CURL or Snoopy or is there a better way?

    Anyone have experience with this?

  2. #2
    Sell crazy someplace else markl999's Avatar
    Join Date
    Aug 2003
    Location
    Manchester, UK
    Posts
    4,007
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Does it have to be PHP? If not i'd use wget

  3. #3
    3MTA3
    Join Date
    Jul 2003
    Location
    Florida
    Posts
    1,016
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by markl999
    Does it have to be PHP? If not i'd use wget
    Yes, needs to be able to work with PHP since the crawl will be triggered by a PHP script.

    Can't wget be called within PHP or are there known issues when doing so?

  4. #4
    3MTA3
    Join Date
    Jul 2003
    Location
    Florida
    Posts
    1,016
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    My hosts don't allow the wget command to be called. Are there any other options? Can Snoopy do recursive downloads?


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •