SitePoint Sponsor

User Tag List

Results 1 to 4 of 4
  1. #1
    SitePoint Addict
    Join Date
    Nov 2009
    Posts
    328
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    preg_match_all() puts all into $result[0] each time

    Hi,

    I got preg_match_all() working with the below pattern. It searches through an entire file read into a buffer.

    $match_pattern = '/[\r\n]*<my_custom_tag>[\r\n]*(.*)[\r\n]*<\\/my_custom_tag>/s';

    If there is only once instance of <my_custom_tag> </my_custom_tag> in the buffer then it works well.

    If however I include two or more instances of <my_custom_tag> </my_custom_tag>, rather than give me each one in its own array ID (i.e. $result[0], $result[1], $result[2] etc.), it packs everything into $result[0].

    Say there is only one instance of <my_custom_tag> /my_custom_tag> in the buffer (read from a file), $result[0] will be 27 chars long. If there's two instances $result[0] will be 1027 chars long and if there's three $result[0] will be 2220 chars long (and so forth). Not only the text between <my_custom_tag> </my_custom_tag> gets kicked into $result[0] but also all the html/php code in between these tags (when there's more than 1 instance of <my_custom_tag> </my_custom_tag>).

    As you can see, it all gets crammed into $result[0] for some reason. It's as if preg_match_all() can't fully (though it does it once fine) identify between <my_custom_tag> and </my_custom_tag>.

    What gives?


    $rss_source_file = fopen("$rss_from_file", "r") or die("can't open file [SOURCE]");
    $rss_write_file = fopen("$rss_to_file", "a") or die("can't open file [DESTINATION]");


    while (!feof ($rss_source_file))
    {
    $buffer = fgets($rss_source_file);
    $lines[] = $buffer;
    }
    $array_count = count($lines);

    $match_pattern = '/[\r\n]*<my_custom_tag>[\r\n]*(.*)[\r\n]*<\\/my_custom_tag>/s';


    for ($i = 0; $i < $array_count; $i++)
    {

    $current_line = $lines[$i];
    $content = get_all_content_between2($current_line, $match_pattern);

    print var_dump($content[$i]);

    fwrite($rss_write_file, trim($content[$i]) ."\r\n");
    fwrite($rss_write_file, " " ."\r\n");

    }

    fclose($rss_source_file) or die("can't close file [SOURCE]");
    fclose($rss_write_file) or die("can't close file [DESTINATION]");



    Something wrong in $match_pattern yes?

    Thanks,

  2. #2
    SitePoint Wizard bronze trophy
    Join Date
    Jul 2008
    Posts
    5,757
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    * means zero or more but you can have a variation on it:

    *? means zero or more, but match as little as possible

    otherwise, * matches as much as possible, aka "greedy"

    Since you do .*, you tell it to match as many of anything as possible. It obliges. It doesn't stop until it finds the last </my_custom_tag> in the string.

  3. #3
    Unobtrusively zen silver trophybronze trophy
    paul_wilkins's Avatar
    Join Date
    Jan 2007
    Location
    Christchurch, New Zealand
    Posts
    14,717
    Mentioned
    103 Post(s)
    Tagged
    4 Thread(s)
    It's a greedy search that's being performed.

    The following information from the PHP pattern modifiers page should help.

    U (PCRE_UNGREEDY)
    This modifier inverts the "greediness" of the quantifiers so that they are not greedy by default, but become greedy if followed by ?. It is not compatible with Perl. It can also be set by a (?U) modifier setting within the pattern or by a question mark behind a quantifier (e.g. .*?).
    Programming Group Advisor
    Reference: JavaScript, Quirksmode Validate: HTML Validation, JSLint
    Car is to Carpet as Java is to JavaScript

  4. #4
    SitePoint Addict
    Join Date
    Nov 2009
    Posts
    328
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks both you guys. I put a "/U" at the end of $match_pattern and its working just fine now filling the array 1, 2, 3 etc. instead of just 0 all the time.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •