SitePoint Sponsor

User Tag List

Results 1 to 8 of 8
  1. #1
    SitePoint Enthusiast
    Join Date
    Sep 2000
    Posts
    81
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Stop checking for a regular expression after X matches?

    I have a script that checks input and finds specific text. I also pass in how many items I need to find, usually 7-15. The problem is, the file usually has 100 or more matches. It matches right, but I don't need all 100. I end up matching all of them, and then loop through the array setting a temporary array equal to the first 7-15 items and then just using that array. The script isn't too quick, so I was hoping that if I could stop it from matching the other 85 that I don't need, it would run a little quicker. This loops and it does it about 225 times. With 85 extra matches each, I end up matching 19,125 extra things that I never use.

  2. #2
    SitePoint Evangelist
    Join Date
    May 2006
    Location
    Austin
    Posts
    401
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    You could add a digit each time the script loops, and once it reaches 15 stop for that particular function.

    Something like:
    PHP Code:

    while(){//loop through words

    $count 0;

    if(
    $count 15){

    preg_match();//stuff here
    $count++;

    }

    There are other ways to do similar tasks, but without the code you are using it is hard to say what is most appropriate.
    Merchant Equipment Store - Merchant Services, POS, Equipment, and supplies.
    Merchant Account Blog | Ecommerce Blog

  3. #3
    SitePoint Wizard silver trophy kyberfabrikken's Avatar
    Join Date
    Jun 2004
    Location
    Copenhagen, Denmark
    Posts
    6,157
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Which pattern are you searching for?

  4. #4
    Programming Team silver trophybronze trophy
    Mittineague's Avatar
    Join Date
    Jul 2005
    Location
    West Springfield, Massachusetts
    Posts
    17,044
    Mentioned
    187 Post(s)
    Tagged
    2 Thread(s)

    limit matches

    Seems there could be some way to incremment the offset each match up to the limit set by the $count flag too.

  5. #5
    SitePoint Enthusiast
    Join Date
    Sep 2000
    Posts
    81
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by kyberfabrikken View Post
    Which pattern are you searching for?
    Input is:
    Code:
    <td><font face="arial" size="3">&nbsp;Name:</font></td>
    <td>&nbsp;<a href="/learnmore"><font color="#000000" face="arial" size="3">What I Want</font></a></td>
    </tr>
    <tr>
    <td><font face="arial" size="3">&nbsp;Zone:</font></td>
    <td>&nbsp;<a href="/learnmore"><font color="#000000" face="arial" size="3">More of what I want</font></a></td>
    </tr>
    <tr bgcolor="#333333">
    <td><font color="#CCCCCC" face="arial" size="4">&nbsp;Level:</font></td>
    <td><font color="#CCCCCC" face="arial" size="4">&nbsp;The last thing I want</font></td>
    Regular Expression is:
    Code:
                  $regexp = '!<font face="arial" size=(?:3|4) color=(?:#000000|CCCCCC)>(?:&nbsp;)?([a-zA-Z0-9\s]+)<\/a>!s';
    Output is:
    What I Want
    More of what I want
    The last thing I want

  6. #6
    SitePoint Wizard silver trophy kyberfabrikken's Avatar
    Join Date
    Jun 2004
    Location
    Copenhagen, Denmark
    Posts
    6,157
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    If the source is well formed XML (Eg. XHTML), you can use the event based expat parser. Since it appears, that the markup may be badly formed, you can use the PEAR package XML_HTMLSax, which basically works the same way, but on badly formed input.

  7. #7
    SitePoint Enthusiast
    Join Date
    Sep 2000
    Posts
    81
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I modified the example that came with it to just print out the tags and the data, so I could see what it was doing. It does get everything I want it to, but how to I get just the stuff I need? Am I still going to have to use regular expressions?

  8. #8
    SitePoint Wizard silver trophy kyberfabrikken's Avatar
    Join Date
    Jun 2004
    Location
    Copenhagen, Denmark
    Posts
    6,157
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    With an event based parser, you have to maintain a stack of tags in order to parse it. I think the following will work for you:
    PHP Code:
    require_once('XML/XML_HTMLSax.php');

    // looking for contents of font tags matching:
    // <font face="arial" size=(?:3|4) color=(?:#000000|CCCCCC)>
    class MyHandler {
      var 
    $callback;
      var 
    $stack = Array();
      
      function 
    MyHandler($callback) {
        
    $this->callback $callback;
      }

      function 
    openHandler(& $parser$name$attrs) {
        if (
    strtoupper($name) == 'FONT') {
          
    $this->stack[] = Array($attrs"");
        }
      }

      function 
    closeHandler(& $parser$name) {
        if (
    strtoupper($name) == 'FONT') {
          
    $this->fontTagHandler(array_pop($this->stack));
        }
      }

      function 
    dataHandler(& $parser,$data) {
        
    $size count($this->stack);
        if (
    $size 0) {
          
    $this->stack[$size 1][1] .= $data;
        }
      }
      
      function 
    fontTagHandler($token) {
        if (
          
    strtoupper(@$token[0]['face']) == 'ARIAL'
          
    && in_array(@$token[0]['size'], Array(34))
          && 
    in_array(strtoupper(@$token[0]['color']), Array('#000000''#CCCCCC'))
        ) {
          
    call_user_func($this->callback$token[1]);
        }
      }
    }

    $doc=<<<EOD
    <td><font face="arial" size="3">&nbsp;Name:</font></td>
    <td>&nbsp;<a href="/learnmore"><font color="#000000" face="arial" size="3">What I Want</font></a></td>
    </tr>
    <tr>
    <td><font face="arial" size="3">&nbsp;Zone:</font></td>
    <td>&nbsp;<a href="/learnmore"><font color="#000000" face="arial" size="3">More of what I want</font></a></td>
    </tr>
    <tr bgcolor="#333333">
    <td><font color="#CCCCCC" face="arial" size="4">&nbsp;Level:</font></td>
    <td><font color="#CCCCCC" face="arial" size="4">&nbsp;The last thing I want</font></td>
    EOD;

    // A dummy handler to retrieve the tokens
    function onToken($data) {
      echo 
    "$data\n";
    }

    // Instantiate the handler
    $handler=new MyHandler('onToken');

    // Instantiate the parser
    $parser=& new XML_HTMLSax();

    // Register the handler with the parser
    $parser->set_object($handler);

    // Set a parser option
    $parser->set_option('XML_OPTION_TRIM_DATA_NODES');

    // Set the handlers
    $parser->set_element_handler('openHandler','closeHandler');
    $parser->set_data_handler('dataHandler');

    // Parse the document
    $parser->parse($doc); 


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •