I have custom tags in a template file which look like the following:
[tagname]text[/tagname]
I’m really terrible with regular expressions and my knowledge is basic. Assuming ‘tagname’ changes from tag to tag, and could contain any number of different characters and the same is true of ‘text’, how would I go about removing these tags and keeping the text?
You have two problems. If there is nothing between the tags, it will fail because the regular expression explicitly asks for at least one character between the tags. This part .+? is the culprit and a quick fix would be to use .*?
Now, the second problem occurs when the content between the tags spans multiple lines. The dot special character (like we just used above) by default matches anything except new lines. To allow it to flow over multiple lines, we can either use more than just the dot or turn on a special flag (called a modifier since it modifies the behavior of the regex) to make dot match the new lines also. To do the latter, we can use the s modifier on the end of your pattern like: …~is
Thank you and AnthonySterling for you help. That’s working perfectly. I have one final problem.
If I want to store the name of the tag [tagname][/tagname] for a variable to be used by php, how do I trap it? I’m thinking I’ll use a preg_match to find the name of the tag and then a preg_replace like the one above to remove the tags. How could i go about finding the name of the tag?
PCRE went beyond the ‘regular’ of formal language theory a long time ago. Feel free to use your preferred alternative approach, but while you’re doing that we’ll use regex to get the job done and move on to the next thing. (:[/ot]
Heres some context. The point of this is to build a custom templating system. When I have finished putting data into my template, I want to remove any unused tags, which are in the form [tag][/tag] and contain some HTML.
Okay, so here is a section of my template file with my tagging method in place:
I need help with the regular expression, so that it can match something like the [authentication] area in my example. It would be even better if I could contain anything within the tag name ( [tagname] ), such as [-£$%TAGname].
Thank you all for your help so far, just need a little more to complete this thing.
As you want anything at all as the tag name, you will want to match against 1 or more characters that are not the closing square bracket [^\]]+
You can also use \1 as a back-reference, so that you can ensure that the end tag matches the start tag.
And, using .*? gives you a non-greedy match, so that the first matching cloing tag (instead of the last) will be used instead.
/ start of regex
\[ match an opening square bracket
( capture group used later on for back reference
[^\]]+ match anything that is not a closing square bracket
) end capture group
\] match a closing square bracket
( start a capture group
.*? match anything up until the first appropriate closing tag
) end capture group
\[ match an opening square bracket
\/ the forward slash denoting an end tag
\1 the same tag name matched at the start
\] match a closing square bracket
/ end of regex
That had the same effect as the one I had in place. It’s much better than mine, and you explained it so that I understood it. It’s still not catching those tags which span over one line though. Is there something I could add to that to make it work on the tags which span a couple of lines?