Regular expression help

openarmy · March 14, 2010, 7:41pm

I have custom tags in a template file which look like the following:

[tagname]text[/tagname]

I’m really terrible with regular expressions and my knowledge is basic. Assuming ‘tagname’ changes from tag to tag, and could contain any number of different characters and the same is true of ‘text’, how would I go about removing these tags and keeping the text?

AnthonySterling · March 14, 2010, 8:18pm


<?php
echo preg_replace(
    '~\\[[^\\]]+?](.+?)\\[[^\\]]+?]~i',
    '$1',
    '[tagname]text[/tagname]'
); #text
?>

openarmy · March 15, 2010, 10:36pm

Thank you. This partially works, but if there is nothing between the tags or certain things, it does not work. For example:


[authentication]
	<div id="login">
		<form id="form1" name="form1" method="post" action="">
			<input type="text" name="username" id="textfield" />
			<input type="text" name="password" id="textfield2" />
			<input type="submit" name="button" id="button" value="Sign Up" />
			<input type="submit" name="button" id="button" value="Login" />
		</form>
		{login_message}
	</div>
[/authentication]

This will not be modified by the preg_replace. Regular expressions are a kick in the balls.

salathe · March 15, 2010, 10:50pm

You have two problems. If there is nothing between the tags, it will fail because the regular expression explicitly asks for at least one character between the tags. This part .+? is the culprit and a quick fix would be to use .*?

Now, the second problem occurs when the content between the tags spans multiple lines. The dot special character (like we just used above) by default matches anything except new lines. To allow it to flow over multiple lines, we can either use more than just the dot or turn on a special flag (called a modifier since it modifies the behavior of the regex) to make dot match the new lines also. To do the latter, we can use the s modifier on the end of your pattern like: …~is

openarmy · March 16, 2010, 11:06pm

Thank you and AnthonySterling for you help. That’s working perfectly. I have one final problem.

If I want to store the name of the tag [tagname][/tagname] for a variable to be used by php, how do I trap it? I’m thinking I’ll use a preg_match to find the name of the tag and then a preg_replace like the one above to remove the tags. How could i go about finding the name of the tag?

AlienDev · March 16, 2010, 11:14pm

Regex shouldn’t be used for non-regular markups >.<

openarmy · March 17, 2010, 8:57am

What would you suggest?

salathe · March 17, 2010, 1:02pm

Just do what you described: grab the tag name with preg_match.

salathe · March 17, 2010, 1:00pm

[ot]

PCRE went beyond the ‘regular’ of formal language theory a long time ago. Feel free to use your preferred alternative approach, but while you’re doing that we’ll use regex to get the job done and move on to the next thing. (:[/ot]

openarmy · April 5, 2010, 4:19pm

Okay, i have been doing much better with this and have gotten alot further with regex. Still having a little problem though:

$seek_me = "/\\[[a-zA-Z0-9_.-]+\\](.)+\\[[\\/][a-zA-Z0-9_.-]+\\]/";
preg_match_all($seek_me, $this->template, $matchz, PREG_SET_ORDER);

I need to read for white space as well where the (.)+ is, until the next pattern starts. Does anybody know how to do this?

Paul_Wilkins · April 5, 2010, 9:23pm

You may do better by using (.*) instead of (.)+

openarmy · April 6, 2010, 12:48am

That’s still not quite getting everything between the [tags][/tags]

Paul_Wilkins · April 6, 2010, 12:51am

Examples please?

openarmy · April 10, 2010, 10:07pm

Heres some context. The point of this is to build a custom templating system. When I have finished putting data into my template, I want to remove any unused tags, which are in the form [tag][/tag] and contain some HTML.

Okay, so here is a section of my template file with my tagging method in place:


<div id="head_right">
[authentication]
	<div id="login">
		<form id="form1" name="form1" method="post" action="">
			<input type="text" name="username" id="textfield" />
			<input type="text" name="password" id="textfield2" />
			<input type="submit" name="button" id="button" value="Sign Up" />
			<input type="submit" name="button" id="button" value="Login" />
		</form>
		{login_message}
	</div>
[/authentication]
</div>
[tags2]intags[/tags2]
[tags3]sploog[/tags3]

In this code, the [authentication] tag area is not matched, but the [tags2] and [tags3] tags are.

Here is the PHP I am using:


$seek_me = "/\\[[a-zA-Z0-9_.-]+\\](.*)\\[[\\/][a-zA-Z0-9_.-]+\\]/";
preg_match_all($seek_me, $template, $matches, PREG_SET_ORDER);	
foreach ($matches as $match)
{
	$original_tag = $match[0];
	$output = str_replace($original_tag, '', $template);
}

I need help with the regular expression, so that it can match something like the [authentication] area in my example. It would be even better if I could contain anything within the tag name ( [tagname] ), such as [-£$%TAGname].
Thank you all for your help so far, just need a little more to complete this thing.

Paul_Wilkins · April 10, 2010, 10:19pm

As you want anything at all as the tag name, you will want to match against 1 or more characters that are not the closing square bracket [^\]]+

You can also use \1 as a back-reference, so that you can ensure that the end tag matches the start tag.

And, using .*? gives you a non-greedy match, so that the first matching cloing tag (instead of the last) will be used instead.

/ start of regex
\[ match an opening square bracket
( capture group used later on for back reference
[^\]]+ match anything that is not a closing square bracket
) end capture group
\] match a closing square bracket
( start a capture group
.*? match anything up until the first appropriate closing tag
) end capture group
\[ match an opening square bracket
\/ the forward slash denoting an end tag
\1 the same tag name matched at the start
\] match a closing square bracket
/ end of regex

/\[([^\]]+)\](.*?)\[\/\1\]/

openarmy · April 10, 2010, 10:30pm

That had the same effect as the one I had in place. It’s much better than mine, and you explained it so that I understood it. It’s still not catching those tags which span over one line though. Is there something I could add to that to make it work on the tags which span a couple of lines?

Paul_Wilkins · April 10, 2010, 10:39pm

Here is the PHP documentation for pattern modifiers where you can find out how to specify multiline searches.

openarmy · April 10, 2010, 10:45pm

Thank you for the help. My final pattern was:

/\\[([^\\]]+)\\](.*?)\\[\\/\\1\\]/s

This works a treat, thanks.