what I am trying to do ( as illogical as this goal may sound ) is to eliminate open/close tag pairs and their content. For example, in the following, I want to eliminate the part in red:
<b class=‘test another’ id=‘x’>hey<i><span class=‘sp’> some more stuff </span><em>
in this example i want to target the open span… its class… its content and then its closing span tags so that I get this :
<b class=‘test another’ id=‘x’>hey<i><em>
( sorry, am being redundant)
I figured this was a job for preg_replace and a GOOD regex expression, this is what I have thus far…
thinking that the regexp expression I created means the following…
( look for a pattern
^< that begins with “<” and is followed immediately by
(.+) a pattern containing one or more charters ( captured pattern #2)
\s? maybe followed by a space or no space
.* maybe followed by 0 or more charcters
>)and lastly an “>”
.* after that there be 0 or more characters
( then another pattern
</ which starts with “</” and is followed by
(?(2)(.+))a pattern containing one or more charters which MATCH the characters of captured pattern #2and is followed by
>) / and lastly an “>” , end search…
somewhere I am off… I would appreciate any fresh perspective on this…
Hi Tom - I’m not arguing that DomDocument wouldn’t be a safer bet, but to my eyes neither of your examples breaks the regex I posted - it handles nesting (as in your 1st example), and your second example is invalid html and as such shouldn’t be removed if I’m understanding the OP’s goal correctly.
Well am not altering the DOM… I am writing a PHP script parser.
the idea kinda 1-upping wordpress, in a way. the data you saw will be reversedd and the tags closed.
so the input is:
“<b class=‘test another’ id=‘x’>hey<i><span class=‘sp’> some more stuff </span><em>”
will output :
"</i></b></em> "
and both of those will be wrapred around some other generated code…
so as to complete a wrap around a script tag. I have got the whole thing works… except when there is an already closed tag pair as shown above…
Am not sure if DOMDoc applies here
Oh one more question cause I like your format…
why did you use ‘#’ instead of’ / ’ to open and close the expression?
I like to use symbols that are less likely to occur in the target string - it requires fewer escape characters and therefore becomes easier to read - /'s occur in html often, but not pound signs.
almost - “\1” (without the quotes). the / is part of the closing tag
call it from :
$wraped=removeTagPairs($wraped);
( I know it goes to remove tag pairs as I have tested this already)
but the preg_ always returns false…