SitePoint Sponsor

User Tag List

Results 1 to 4 of 4
  1. #1
    googlicious graymatter bvarvel's Avatar
    Join Date
    Sep 2002
    Location
    Katy, TX
    Posts
    956
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Preg Match Problems...

    PregMatch?

    Can anyone clue me in on how to scrape a website....

    For example, one a webpage i have the following code in a simple page:

    PHP Code:
    <td align="left" vAlign="top" class="contents">This is a test</td
    I need to programatically extract This is a test. Does anyone have any quick examples?

    thanks!

  2. #2
    SitePoint Addict battra's Avatar
    Join Date
    Oct 2004
    Location
    Asylum
    Posts
    277
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

  3. #3
    googlicious graymatter bvarvel's Avatar
    Join Date
    Sep 2002
    Location
    Katy, TX
    Posts
    956
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I'll be reading this from another website, and there are specific things I need to look for, so strip tags won't work for me. I'm trying to scrape data using the preg_match to get specific data.

    The actual lines I'm pulling data from might look like this:

    PHP Code:
     <td align="left" vAlign="top" width="100" class="headers">Address:</td>
    <
    td align="left" vAlign="top" class="contents">810 Bluebonnet Lane</td
    And I'll need to parse it like so:

    <p>
    <strong>Address/strong>&nbsp;810 Bluebonnet Lane
    </p>

    There are about 12 fields I need to pull. Can you help?

  4. #4
    eschew sesquipedalians silver trophy sweatje's Avatar
    Join Date
    Jun 2003
    Location
    Iowa, USA
    Posts
    3,749
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    PHP Code:
        function testFindAddress() {
        
        
    $src '<td align="left" vAlign="top" width="100" class="headers">Address:</td>
    <td align="left" vAlign="top" class="contents">810 Bluebonnet Lane</td> '
    ;

        
    $regex '~<td[^>]+class="headers"[^>]*>(.*?)</td>.*?<td[^>]+class="contents"[^>]*>(.*?)</td>~ims';
        
        
    preg_match($regex$src$match);
        
        
    $this->assertEqual('Address:'$match[1]);
        
    $this->assertEqual('810 Bluebonnet Lane'$match[2]);
        
        } 
    Jason Sweat ZCE - jsweat_php@yahoo.com
    Book: PHP Patterns
    Good Stuff: SimpleTest PHPUnit FireFox ADOdb YUI
    Detestable (adjective): software that isn't testable.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •