SitePoint Sponsor

User Tag List

Results 1 to 5 of 5

Hybrid View

  1. #1
    SitePoint Zealot s21825's Avatar
    Join Date
    Oct 2003
    Location
    Canada
    Posts
    162
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Regex Help - Multi-line matching

    Hi all,

    I'm trying to write a regular expression that will match everything between to tags, including new lines, but not in a 'greedy' way.

    i.e. I'd like to match everything between [lsit] and [/lsit] and capture everything between those tags.

    e.g. If this is my haystack:
    Code:
    This is a list ...
    [lsit]
    a bunch of text that
    spans several
    lines
    [/lsit]
    ... I'd like to be able to capture

    Code:
    a bunch of text that
    spans several
    lines
    This what I've got so far:

    [^\[lsit\](.+?)\[/lsit\]$]m

    But I can't figure out how to match new lines. I first remove all \r characters from my haystack then I look for \n characters but they don't seem to match.

    Can anyone offer any suggestions?

    BTW I added some typos to my spelling of 'list' so that the forum would not try to replace them.
    s21825

  2. #2
    SitePoint Evangelist
    Join Date
    Aug 2005
    Posts
    453
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    PHP Code:
    $start strpos$text_to_parse'[lsit]' );
    $start += 6;
    $end strpos$text_to_parse'[/lsit]'$start );
    $end -= 7;
    $capture substr$text_to_parse$start$end $start ); 
    Computers and Fire ...
    In the hands of the inexperienced or uneducated,
    the results can be disastrous.
    While the professional can tame, master even conquer.

  3. #3
    Worship the Krome kromey's Avatar
    Join Date
    Sep 2006
    Location
    Fairbanks, AK
    Posts
    1,621
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    You are anchoring your pattern, which means it will only match if the entire line begins with [lsit] and ends with [/lsit]. Since this is obviously not what you're looking for, drop the anchors. Second, you're using metacharacters for your pattern delimiters, and you are using different characters to start and end your pattern.

    Your pattern should look like this:
    #\[lsit\](.+?)\[/lsit\]#m

    You can use something other than '#' if you prefer, but whatever you use you must use the same character to start and end the pattern, you must use a character that does not appear within the pattern, and you must not use a metacharacter.

    Edit:


    Oops, I goofed, the correct pattern modifier should be s, not m, like so:
    #\[lsit\](.+?)\[/lsit\]#s
    Last edited by kromey; Jun 27, 2007 at 12:16. Reason: correction
    PHP questions? RTFM
    MySQL questions? RTFM

  4. #4
    SitePoint Zealot mwasif's Avatar
    Join Date
    Apr 2007
    Location
    Pakistan
    Posts
    102
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    You also need to use \s to match whitespace (space, tab, newline, etc).

  5. #5
    SitePoint Zealot s21825's Avatar
    Join Date
    Oct 2003
    Location
    Canada
    Posts
    162
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Excellent, thanks a lot!

    This is what I've ended up with:

    #\[list\](.+?)\[/list\]#s

    The key piece was the s modifier which makes the . match everything including new line characters.

    I wasn't aware of the problem with the pattern delimiters ... I have been using [ and ] for months now and they seemed to be working fine. I'll look at going back to update my existing patterns.

    Thanks again everyone for your help!
    s21825


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •