SitePoint Sponsor

User Tag List

Results 1 to 9 of 9
  1. #1
    Non-Member
    Join Date
    Mar 2011
    Posts
    1
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    preg_replace, more than 4

    I have a string contains sequences of 0 and 1. I need to replace all zero sequences whose length less than 5, into number 1 with same length. The zero sequences with length 5 or more should be left as is.

    For example

    source : 11000001100010011000001
    result : 11000001111111111000001


    <snip/>
    Last edited by ScallioXTX; Mar 25, 2011 at 09:43. Reason: Snipped unnecessary link

  2. #2
    SitePoint Wizard bronze trophy chris.upjohn's Avatar
    Join Date
    Apr 2010
    Location
    Melbourne, AU
    Posts
    2,183
    Mentioned
    17 Post(s)
    Tagged
    1 Thread(s)
    Try the below, it should work fine.

    PHP Code:
    $numbers '00011010011000001011011101';
    echo 
    $numbers '<br />';
    $numbers preg_replace('/0{5}/''11111'$numbers);
    echo 
    $numbers
    Blog/Portfolio | Evolution Xtreme | DFG Design | DFG Hosting | CSS-Tricks | Stack Overflow | Paul Irish
    Having lame problems with your code? Let us help by using a jsFiddle

  3. #3
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    8,891
    Mentioned
    138 Post(s)
    Tagged
    2 Thread(s)
    PHP Code:
    $string '11000001100010011000001';
    $pattern '/(0+)/e'
    $replacement "strlen('\\1') > 4 ? '\\1' : str_repeat('1', strlen('\\1'))";

    echo 
    preg_replace(
      
    $pattern,
      
    $replacement,
      
    $string
    ); 
    Rémon - Hosting Advisor

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy

  4. #4
    @php.net Salathe's Avatar
    Join Date
    Dec 2004
    Location
    Edinburgh
    Posts
    1,396
    Mentioned
    54 Post(s)
    Tagged
    0 Thread(s)
    ScallioXTX, I would always advise using preg_replace_callback() rather than the evil modifier!

    PHP Code:
    $numbers     '0001101001100000101101110100001110000010000';
    $pattern     '/0+/'
    $replacement = function ($m) {
        if (
    strlen($m[0]) < 5) {
            return 
    strtr($m[0], '0''1');
        }
        return 
    $m[0];
    };

    echo 
    preg_replace_callback($pattern$replacement$numbers);
    // 1111111111100000111111111111111110000011111 
    This could also be done with plain replacement by crafting a regex which looks only for 0s within a sequence of between one and four 0s.

    PHP Code:
    $numbers '0001101001100000101101110100001110000010000';
    echo 
    preg_replace('/\G0|(?<=^|1)0(?=0{0,3}(?:1|$))/''1'$numbers);
    // 1111111111100000111111111111111110000011111 
    However, because something can be done, does not mean it should. But for education purposes, go wild.
    Salathe
    Software Developer and PHP Manual Author.

  5. #5
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    8,891
    Mentioned
    138 Post(s)
    Tagged
    2 Thread(s)
    I agree, preg_replace_callback is nicer, I'm not entirely sure why I suggested /e instead!

    And I thought I was pretty okay with regex, but this is going over my head. Would you mind giving a break down of what the different parts do?
    Rémon - Hosting Advisor

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy

  6. #6
    @php.net Salathe's Avatar
    Join Date
    Dec 2004
    Location
    Edinburgh
    Posts
    1,396
    Mentioned
    54 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by ScallioXTX View Post
    And I thought I was pretty okay with regex, but this is going over my head. Would you mind giving a break down of what the different parts do?

    Of course, I'll try and explain.

    The regex is essentially split into two parts, one fairly simple (but extremely useful) and the other a bit less simple but mostly because it's a bit ugly. Those parts are separated by the alternation operator (a pipe, |) such that the main regex can match either of the two alternatives. Those two parts are a) \G0 and b) (?<=^|1)0(?=0{0,3}(?:1|$)). I'll cover part b first (because part a won't make much sense without it).

    Matching (?<=^|1)0(?=0{0,3}(?:1|$))

    Taking this to pieces, there are three main parts:
    1. (?<=^|1)
    2. 0
    3. (?=0{0,3}(?:1|$))


    The two complicated parts use lookarounds (lookbehind, and lookahead, respectively).

    Part a. looks "behind" the current matching position for either the start of the subject string, or a number 1. So given the subject string above, this part would match successfully at the start of the string and immediately after any number 1s.

    Part b. matches just a number 0. So building up what can be matched, that's only a zero preceded by the start of the string or a number 1. Easy enough so far?

    Part c. is a little more complex. It looks ahead (after the number 0) to see if there are between 0 and 3 number 0s followed by either a number 1 or the end of the string. This is the part that limits the number of sequential 0s to between 1 and 4 inclusive (or as the OP stated, "all zero sequences whose length less than 5"). If that's not clear, here are a few examples. Say we just matched a zero (in grey) and want to check this lookahead:
    • 00000 = FAIL
      because there is a fourth zero after the zero from part b.
    • 01 = PASS
      because there is a following 1
    • 0<end of string> = PASS
      because the zero was at the end of the string


    That is the end of the complicated part of the main regex. So in English, it matches:
    • any 0,
    • either at the start of the string or preceded by a 1,
    • and at the end of the string or followed by up to three 0s.


    Visually, this part would match as follows:
    0001101001100000101101110100001110000010000

    Great, but that only matches the first of the sequences of up to 4 0s! This is where the super-concise other alternative comes in.

    Matching \G0

    The \G start of match assertion is the key here, and will take some explaining. This is a special check which is only true when the current matching position is at the start point of the match.

    The start point of the match is the point at which the current matching run starts ("well, duh" some might say). In practice, this means the points either at the very start of the whole process (when the start point of the match is the beginning of the string) and when starting again after a replacement (when the start point of the match is essentially the point after the replacement).

    So this part matches a 0 which is at the start of the subject string (aside: for the observant reader, this means the ^ alternative in the lookbehind for the other part is redundant!) or immediately following the point where matching starts again after a replacement (so, after matching the first 0 in a sequence).

    Again lets describe this visually. Given a subject string of ababcabab lets see what happens:
    • preg_replace('/\Gab/', '|$0', 'ababcabab') gives
      |ab|abcabab (replaced parts highlighted in green)
    • preg_replace('/ab/', '|$0', 'ababcabab') gives |ab|abc|ab|ab


    The difference above is caused by the \G which means that the letters ab could only be matched at a start point. After matching ab before the c, then the start point is before that letter c. The c does not match the regex and so it is skipped and the next character is examined, but now this is not at the match start point so \G fails.

    So back to \G0, it matches:
    • any 0,
    • start point of a match (i.e. following a replacement)


    Putting the pieces together

    Our full regex looks for:
    • any 0, that is
      • at the start of the string or preceded by a 1, and
      • at the end of the string or followed by up to three 0s;

      or
      • immediately follows one of the above.
    Salathe
    Software Developer and PHP Manual Author.

  7. #7
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    8,891
    Mentioned
    138 Post(s)
    Tagged
    2 Thread(s)
    I've read it several times and I think I get it. So, if we call \G0 a and (?<=^|1)0(?=0{0,3}(?:1|$)) b, am I correct in stating the following happens?

    Code:
    0001101001100000101101110100001110000010000
    aaa  b ba  baaaa b  b   b baaa   baaaa baaa
    I put an a below any zero that will be replaced by a 1 as per part a and analogous a b for a 0 that will be replaced by a 1 as per part b

    Is the above correct?
    Rémon - Hosting Advisor

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy

  8. #8
    @php.net Salathe's Avatar
    Join Date
    Dec 2004
    Location
    Edinburgh
    Posts
    1,396
    Mentioned
    54 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by ScallioXTX View Post
    I've read it several times and I think I get it.
    This kind of thing can take a while to grasp (my explanation probably didn't help). If you think that you get it, then awesome!

    Quote Originally Posted by ScallioXTX View Post
    Code:
    0001101001100000101101110100001110000010000
    aaa  b ba  baaaa b  b   b baaa   baaaa baaa
    Is the above correct?
    The idea is correct, but the sequences of 5 zeros would not be matched.

    Code:
    0001101001100000101101110100001110000010000
    aaa  b ba  baaaa b  b   b baaa   baaaa baaa
    Salathe
    Software Developer and PHP Manual Author.

  9. #9
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    8,891
    Mentioned
    138 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by Salathe View Post
    This kind of thing can take a while to grasp (my explanation probably didn't help). If you think that you get it, then awesome!
    Ah, I already knew most of the concepts (except for \G, but you explained that well!), but just never saw them in such a complex setting.
    After reading your explanation about 4 or 5 times (because it's such a complex subject, not because your explanation is bad, it isn't!) I'm pretty sure I fully get what it does now

    Quote Originally Posted by Salathe View Post
    The idea is correct, but the sequences of 5 zeros would not be matched.

    Code:
    0001101001100000101101110100001110000010000
    aaa  b ba  baaaa b  b   b baaa   baaaa baaa
    Yes, of course

    Thanks, Salathe !
    Rémon - Hosting Advisor

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •