SitePoint Sponsor

User Tag List

Results 1 to 7 of 7

Thread: what is greedy?

  1. #1
    SitePoint Enthusiast
    Join Date
    Jan 2003
    Posts
    25
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    what is greedy?

    could someone explain to me in very simple terms ( ) what is meant by greedy matching in regexps? preferably via an example??

    cheers!

  2. #2
    SitePoint Zealot Egghead's Avatar
    Join Date
    Feb 2002
    Posts
    197
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

  3. #3
    SitePoint Wizard Chris82's Avatar
    Join Date
    Mar 2002
    Location
    Osnabrück
    Posts
    1,003
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi.

    Greedy means that the regexp matches from the first occurrence of the pattern until the last.

    example: this is the string and we want to capture the words inside the bold tags.

    PHP Code:
    $string 'This is my <b>black</b> cat her <b>name</b> is Furball.'
    Now if you use a "greedy" regexp like:

    PHP Code:
    preg_match('#<b>(.*)</b>#'$string$match);
    print_r($match); 
    The bold tags surround the result. Since this regexp is "greedy" matches from the first "<b>" to the last "</b>" in the string, the result is:

    Code:
    black</b> cat her <b>name
    Not what we want. If we use the "?" to make the regexp ungreedy we get the desired result.

    PHP Code:
    preg_match('#<b>(.*?)</b>#'$string$match);
    print_r($match); 
    which results in an array containing "black" and "name".

  4. #4
    ********* Member website's Avatar
    Join Date
    Oct 2002
    Location
    Iceland
    Posts
    1,238
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Lets take this string for example:
    PHP Code:
    $string 'Here is | an example code | of something big | and strange'
    Then lets run this command:
    PHP Code:
    $string preg_replace('#^(.+)\|#''replacement'$string); 
    When doing echo($string) it would output 'replacement and strange' because it replaces from the start of the string to the last occurance of |.
    This is called greedy.

    However if we would run this command
    PHP Code:
    $string preg_replace('#^(.+?)\|#''replacement'$string); 
    echo($string) would output 'replacement an example code | of something big | and strange', see how it goes from the start of the string to the first occurance of |, this is called ungreedy.
    To switch from greedy to greedy use ? or the 'U' modifier.

    Hope this helps

    Edit:

    Too slow
    - website

  5. #5
    SitePoint Wizard Chris82's Avatar
    Join Date
    Mar 2002
    Location
    Osnabrück
    Posts
    1,003
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by website
    Edit:

    Too slow
    Ah you're not the only one. Happened to my quite often aswell

  6. #6
    SitePoint Enthusiast
    Join Date
    Jan 2003
    Posts
    25
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    thanks! i *think* i get it lol!!!

    i will have a look in more depth when i get home...for example if i did a replace that swapped my custom formatting tag [b ][ eb ] for html bold tags, and did a replace on a string like

    [ b ]here[ eb ] are some [ b ]words[ eb ], some of them are in [ b ]bold[ eb ] and some of them are [ b ]not[ eb ]

    i would get something like
    < b >here[ eb ] are some [ b ]words[ eb ], some of them are in [ b ]bold[ eb ] and some of them are [ b ]not< / b >

    is that right?

  7. #7
    SitePoint Wizard Chris82's Avatar
    Join Date
    Mar 2002
    Location
    Osnabrück
    Posts
    1,003
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Yes if you use a greedy regexp that will be the result.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •