SitePoint Sponsor

User Tag List

Results 1 to 5 of 5
  1. #1
    SitePoint Wizard Dean C's Avatar
    Join Date
    Mar 2003
    Location
    England, UK
    Posts
    2,906
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Regular expression problem

    Hey,

    I'm sure this is an easy one for a regex guru

    PHP Code:
        $string '[b]hmmz[/b]testing[b]test[/b]';
        
    $pattern '/\[b\](.*)\[\/b\]/';
        if(
    preg_match($pattern$string$array))
        {
            
    //$array[1] = strip_tags($array[1]);
            //echo $array[1];
            
    $string preg_replace($pattern'<b>\1</b>'$string);
            echo 
    $string;
        } 
    This outputs:
    HTML Code:
    <b>hmmz[/b]testing[b]test</b>
    [/b]


    I can see why this happens, but the question is how to stop it happening So it matches the next closing
    instead of the very last!

    Edit:


    Hmm it seems changing the pattern to:
    PHP Code:
    $pattern '/\[b\](.*?)\[\/b\]/'
    Fixes the problem. Why is this?



    Edit:


    PHP Code:
        $string '[b][b]hmmz[/b]testing[b]test[/b][/b]';
        
    $pattern '/\[b\](.*?)\[\/b\]/';
        if(
    preg_match_all($pattern$string$array))
        {
            
    //$array[1] = strip_tags($array[1]);
            //echo $array[1];
            
    $string preg_replace($pattern'<b>\1</b>'$string);
            echo 
    $string;
        } 
    The problem with this above is it doesn't put surrounding <b></b> around the whole block too

  2. #2
    SitePoint Addict devil cat's Avatar
    Join Date
    Apr 2003
    Location
    Reno
    Posts
    344
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    With something as simple as this, why not just use str_replace?

    PHP Code:
      $string '[b]hmmz[/b]testing[b]test[/b]';
      
    $string str_replace('[b]''<b>'$string);
      
    $string str_replace('[/b]''</b>'$string);

      echo 
    $string
    Is there a specific reason why not?

    This function returns a string or an array with all occurrences of search in subject replaced with the given replace value. If you don't need fancy replacing rules (like regular expressions), you should always use this function instead of ereg_replace() or preg_replace().

  3. #3
    SitePoint Wizard Dean C's Avatar
    Join Date
    Mar 2003
    Location
    England, UK
    Posts
    2,906
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Because ideally it should break in certain circumstances e.g:

    HTML Code:
    [b]test[/b][/b]
    That should output <b>test[/b]</b>

  4. #4
    SitePoint Evangelist
    Join Date
    Jan 2005
    Posts
    502
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    hey,
    it is because in the default use, the regular expression is "greedy" meaning it matches as much as it can while satisfying the pattern.
    What you did in the edit portion was add the '?' which means it is a non-greedy quantifier, and will match the minimum amount while still satisfying the pattern.

  5. #5
    SitePoint Addict devil cat's Avatar
    Join Date
    Apr 2003
    Location
    Reno
    Posts
    344
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I think I know why it is doing this.

    It seems to do a progressive search through the string looking for matches. Since the second (b) follows the first without a (/b), it matches the first set and skips that next tag in favor of looking for a start that follows the first matched (/b). I really hope that makes sense. If not, I'll try to explain better. It might help to take a look at which tags are not being used:

    This will force it past that, and still leave incomplete pairs untouched:

    PHP Code:
      $string '[b][b]hmmz[/b]testing[b]test[/b][/b]';
      
    $pattern '/\[b\](.*?)\[\/b\]/';
      while(
    preg_match_all($pattern$string$array))
      {
          
    //$array[1] = strip_tags($array[1]);
          //echo $array[1];
          
    $string preg_replace($pattern'<b>\1</b>'$string);
      }
      echo 
    $string
    edit - possibly better explanation:

    When it replaces the tags, it starts at the end of the first replacement set (a complete (b)(/b) set) and scans the string from there. Since the second (b) lies within the first set, it gets skipped when it starts looking again.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •