SitePoint Sponsor

User Tag List

Results 1 to 2 of 2
  1. #1
    SitePoint Member
    Join Date
    Apr 2011
    0 Post(s)
    0 Thread(s)

    preg_match_all matches too much text

    I have a problem with the following;

    I'm trying to edit H2 links to add id attributes to them with the following code;

    PHP Code:
    preg_match_all("/\<h2 id=\"(.*)\">(.*)\<\/h2\>/i",$content,$matches); 
    This code works fine with most of my texts, but when I have a text like;

    <h2>Title</h2>more text without space

    It won't stop at the 2nd boundary and matches the whole string till the next </h2> tag. When I have a \r\n (newline) in place after the 2nd </h2> the script works perfectly. Anyone have an idea on how to fix this? I think I'm missing some kind of limiter. (I've tried \b and \B without success)

    Your help is greatly appreciated,


  2. #2
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    The Netherlands
    153 Post(s)
    2 Thread(s)
    The problem is that (.*) is known as greedy regex; that is, it will eat anything and everything it sees, and sometimes even eats up what we think it shouldn't because it's later in our regex (the <h2> in this case).
    There are two things you could do

    1) Replace (.*) with an atom that tells exactly what to match, so something like ([a-zA-Z0-9\s]+) to match any character, digit and spaces OR
    2) if you don't what you will be matching, make the (.*) lazy by adding a question mark: (.*?)

    Method is one is preferred, but method two also works
    Rémon - Hosting Advisor

    SitePoint forums will switch to Discourse soon! Make sure you're ready for it!

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy


Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts