SitePoint Sponsor

User Tag List

Results 1 to 18 of 18
  1. #1
    SitePoint Enthusiast
    Join Date
    Nov 2006
    Posts
    60
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    regular expression help

    I have custom tags in a template file which look like the following:

    PHP Code:
    [tagname]text[/tagname
    I'm really terrible with regular expressions and my knowledge is basic. Assuming 'tagname' changes from tag to tag, and could contain any number of different characters and the same is true of 'text', how would I go about removing these tags and keeping the text?

  2. #2
    Twitter: @AnthonySterling silver trophy AnthonySterling's Avatar
    Join Date
    Apr 2008
    Location
    North-East, UK.
    Posts
    6,111
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    PHP Code:
    <?php
    echo preg_replace(
        
    '~\[[^\]]+?](.+?)\[[^\]]+?]~i',
        
    '$1',
        
    '[tagname]text[/tagname]'
    ); #text
    ?>
    @AnthonySterling: I'm a PHP developer, a consultant for oopnorth.com and the organiser of @phpne, a PHP User Group covering the North-East of England.

  3. #3
    SitePoint Enthusiast
    Join Date
    Nov 2006
    Posts
    60
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by AnthonySterling View Post
    PHP Code:
    <?php
    echo preg_replace(
        
    '~\[[^\]]+?](.+?)\[[^\]]+?]~i',
        
    '$1',
        
    '[tagname]text[/tagname]'
    ); #text
    ?>
    Thank you. This partially works, but if there is nothing between the tags or certain things, it does not work. For example:

    Code HTML4Strict:
    [authentication]
    	<div id="login">
    		<form id="form1" name="form1" method="post" action="">
    			<input type="text" name="username" id="textfield" />
    			<input type="text" name="password" id="textfield2" />
    			<input type="submit" name="button" id="button" value="Sign Up" />
    			<input type="submit" name="button" id="button" value="Login" />
    		</form>
    		{login_message}
    	</div>
    [/authentication]

    This will not be modified by the preg_replace. Regular expressions are a kick in the balls.

  4. #4
    @php.net Salathe's Avatar
    Join Date
    Dec 2004
    Location
    Edinburgh
    Posts
    1,397
    Mentioned
    61 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by openarmy View Post
    if there is nothing between the tags or certain things, it does not work.
    You have two problems. If there is nothing between the tags, it will fail because the regular expression explicitly asks for at least one character between the tags. This part .+? is the culprit and a quick fix would be to use .*?

    Now, the second problem occurs when the content between the tags spans multiple lines. The dot special character (like we just used above) by default matches anything except new lines. To allow it to flow over multiple lines, we can either use more than just the dot or turn on a special flag (called a modifier since it modifies the behavior of the regex) to make dot match the new lines also. To do the latter, we can use the s modifier on the end of your pattern like: …~is
    Salathe
    Software Developer and PHP Manual Author.

  5. #5
    SitePoint Enthusiast
    Join Date
    Nov 2006
    Posts
    60
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Salathe View Post
    You have two problems. If there is nothing between the tags, it will fail because the regular expression explicitly asks for at least one character between the tags. This part .+? is the culprit and a quick fix would be to use .*?

    Now, the second problem occurs when the content between the tags spans multiple lines. The dot special character (like we just used above) by default matches anything except new lines. To allow it to flow over multiple lines, we can either use more than just the dot or turn on a special flag (called a modifier since it modifies the behavior of the regex) to make dot match the new lines also. To do the latter, we can use the s modifier on the end of your pattern like: …~is
    Thank you and AnthonySterling for you help. That's working perfectly. I have one final problem.

    If I want to store the name of the tag [tagname][/tagname] for a variable to be used by php, how do I trap it? I'm thinking I'll use a preg_match to find the name of the tag and then a preg_replace like the one above to remove the tags. How could i go about finding the name of the tag?

  6. #6
    SitePoint Evangelist AlienDev's Avatar
    Join Date
    Feb 2007
    Location
    UK
    Posts
    591
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Regex shouldn't be used for non-regular markups >.<
    Me on StackOverflow | Blog & personal website.

    I mostly use: PHP, Java, JavaScript, Android.

  7. #7
    SitePoint Enthusiast
    Join Date
    Nov 2006
    Posts
    60
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by AlienDev View Post
    Regex shouldn't be used for non-regular markups >.<
    What would you suggest?

  8. #8
    @php.net Salathe's Avatar
    Join Date
    Dec 2004
    Location
    Edinburgh
    Posts
    1,397
    Mentioned
    61 Post(s)
    Tagged
    0 Thread(s)
    Off Topic:

    Quote Originally Posted by AlienDev View Post
    Regex shouldn't be used for non-regular markups >.<
    PCRE went beyond the 'regular' of formal language theory a long time ago. Feel free to use your preferred alternative approach, but while you're doing that we'll use regex to get the job done and move on to the next thing.
    Salathe
    Software Developer and PHP Manual Author.

  9. #9
    @php.net Salathe's Avatar
    Join Date
    Dec 2004
    Location
    Edinburgh
    Posts
    1,397
    Mentioned
    61 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by openarmy View Post
    I'm thinking I'll use a preg_match to find the name of the tag and then a preg_replace like the one above to remove the tags. How could i go about finding the name of the tag?
    Just do what you described: grab the tag name with preg_match.
    Salathe
    Software Developer and PHP Manual Author.

  10. #10
    SitePoint Enthusiast
    Join Date
    Nov 2006
    Posts
    60
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Okay, i have been doing much better with this and have gotten alot further with regex. Still having a little problem though:

    PHP Code:
    $seek_me "/\[[a-zA-Z0-9_.-]+\](.)+\[[\/][a-zA-Z0-9_.-]+\]/";
    preg_match_all($seek_me$this->template$matchzPREG_SET_ORDER); 
    I need to read for white space as well where the (.)+ is, until the next pattern starts. Does anybody know how to do this?

  11. #11
    Unobtrusively zen silver trophybronze trophy
    paul_wilkins's Avatar
    Join Date
    Jan 2007
    Location
    Christchurch, New Zealand
    Posts
    14,684
    Mentioned
    99 Post(s)
    Tagged
    4 Thread(s)
    Quote Originally Posted by openarmy View Post
    I need to read for white space as well where the (.)+ is, until the next pattern starts. Does anybody know how to do this?
    You may do better by using (.*) instead of (.)+
    Programming Group Advisor
    Reference: JavaScript, Quirksmode Validate: HTML Validation, JSLint
    Car is to Carpet as Java is to JavaScript

  12. #12
    SitePoint Enthusiast
    Join Date
    Nov 2006
    Posts
    60
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by pmw57 View Post
    You may do better by using (.*) instead of (.)+
    That's still not quite getting everything between the [tags][/tags]

  13. #13
    Unobtrusively zen silver trophybronze trophy
    paul_wilkins's Avatar
    Join Date
    Jan 2007
    Location
    Christchurch, New Zealand
    Posts
    14,684
    Mentioned
    99 Post(s)
    Tagged
    4 Thread(s)
    Examples please?
    Programming Group Advisor
    Reference: JavaScript, Quirksmode Validate: HTML Validation, JSLint
    Car is to Carpet as Java is to JavaScript

  14. #14
    SitePoint Enthusiast
    Join Date
    Nov 2006
    Posts
    60
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Heres some context. The point of this is to build a custom templating system. When I have finished putting data into my template, I want to remove any unused tags, which are in the form [tag][/tag] and contain some HTML.

    Okay, so here is a section of my template file with my tagging method in place:

    Code HTML4Strict:
    <div id="head_right">
    [authentication]
    	<div id="login">
    		<form id="form1" name="form1" method="post" action="">
    			<input type="text" name="username" id="textfield" />
    			<input type="text" name="password" id="textfield2" />
    			<input type="submit" name="button" id="button" value="Sign Up" />
    			<input type="submit" name="button" id="button" value="Login" />
    		</form>
    		{login_message}
    	</div>
    [/authentication]
    </div>
    [tags2]intags[/tags2]
    [tags3]sploog[/tags3]

    In this code, the [authentication] tag area is not matched, but the [tags2] and [tags3] tags are.

    Here is the PHP I am using:

    PHP Code:
    $seek_me "/\[[a-zA-Z0-9_.-]+\](.*)\[[\/][a-zA-Z0-9_.-]+\]/";
    preg_match_all($seek_me$template$matchesPREG_SET_ORDER);    
    foreach (
    $matches as $match)
    {
        
    $original_tag $match[0];
        
    $output str_replace($original_tag''$template);

    I need help with the regular expression, so that it can match something like the [authentication] area in my example. It would be even better if I could contain anything within the tag name ( [tagname] ), such as [-$%TAGname].
    Thank you all for your help so far, just need a little more to complete this thing.

  15. #15
    Unobtrusively zen silver trophybronze trophy
    paul_wilkins's Avatar
    Join Date
    Jan 2007
    Location
    Christchurch, New Zealand
    Posts
    14,684
    Mentioned
    99 Post(s)
    Tagged
    4 Thread(s)
    As you want anything at all as the tag name, you will want to match against 1 or more characters that are not the closing square bracket [^\]]+

    You can also use \1 as a back-reference, so that you can ensure that the end tag matches the start tag.

    And, using .*? gives you a non-greedy match, so that the first matching cloing tag (instead of the last) will be used instead.

    / start of regex
    \[ match an opening square bracket
    ( capture group used later on for back reference
    [^\]]+ match anything that is not a closing square bracket
    ) end capture group
    \] match a closing square bracket
    ( start a capture group
    .*? match anything up until the first appropriate closing tag
    ) end capture group
    \[ match an opening square bracket
    \/ the forward slash denoting an end tag
    \1 the same tag name matched at the start
    \] match a closing square bracket
    / end of regex


    /\[([^\]]+)\](.*?)\[\/\1\]/
    Programming Group Advisor
    Reference: JavaScript, Quirksmode Validate: HTML Validation, JSLint
    Car is to Carpet as Java is to JavaScript

  16. #16
    SitePoint Enthusiast
    Join Date
    Nov 2006
    Posts
    60
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    That had the same effect as the one I had in place. It's much better than mine, and you explained it so that I understood it. It's still not catching those tags which span over one line though. Is there something I could add to that to make it work on the tags which span a couple of lines?

  17. #17
    Unobtrusively zen silver trophybronze trophy
    paul_wilkins's Avatar
    Join Date
    Jan 2007
    Location
    Christchurch, New Zealand
    Posts
    14,684
    Mentioned
    99 Post(s)
    Tagged
    4 Thread(s)
    Here is the PHP documentation for pattern modifiers where you can find out how to specify multiline searches.
    Programming Group Advisor
    Reference: JavaScript, Quirksmode Validate: HTML Validation, JSLint
    Car is to Carpet as Java is to JavaScript

  18. #18
    SitePoint Enthusiast
    Join Date
    Nov 2006
    Posts
    60
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thank you for the help. My final pattern was:

    PHP Code:
    /\[([^\]]+)\](.*?)\[\/\1\]/
    This works a treat, thanks.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •