Regex Conudrum

Hi, I really hope someone can help with this as it’s driving me nuts!

Original Text:

challenge<div><span class=“challenge”>This is challenge first of all <b>challenge</b> ending today </span><a href=“challenge.com”>This <em>challenge</em> <b>is</b> <b>a</b> here</a></div><p>Another challenge there</p><a href=“challenge.com”>Test this <b>challenge</b></a></div><p>Another challenge there</p><PRE>Big challenge</PRE><a href=“challenge.com”>This <b>challenge</b> is a challenge help here</a></div><p>Another challenge there</p></div>

preg_replace patter so far:

challenge((?!([^<]+)?((<\/a)|>)))

using REPLACEMENT gives:

REPLACEMENT<div><span class=“challenge”>This is REPLACEMENT first of all <b>REPLACEMENT</b> ending today </span><a href=“challenge.com”>This <em>REPLACEMENT</em> <b>is</b> <b>a</b> here</a></div><p>Another REPLACEMENT there</p><a href=“challenge.com”>Test this <b>REPLACEMENT</b></a></div><p>Another REPLACEMENT there</p><PRE>Big REPLACEMENT</PRE><a href=“challenge.com”>This <b>REPLACEMENT</b> is a challenge help here</a></div><p>Another REPLACEMENT there</p></div>

This is so close to being correct. But any matches within html tags e.g. <em> inside the anchor text of the <a> tags are also being matched. I don’t want this. I want all matches inside HTML tags AND within anchor text to be excluded.

The perfect result should be:

REPLACEMENT<div><span class=“challenge”>This is REPLACEMENT first of all <b>REPLACEMENT</b> ending today </span><a href=“challenge.com”>This <em>challenge</em> <b>is</b> <b>a</b> here</a></div><p>Another REPLACEMENT there</p><a href=“challenge.com”>Test this <b>challenge</b></a></div><p>Another REPLACEMENT there</p><PRE>Big REPLACEMENT</PRE><a href=“challenge.com”>This <b>challenge</b> is a challenge help here</a></div><p>Another REPLACEMENT there</p></div>

Many thanks in advance.

Gary

So you only want that very first “challenge” to be replace, since all of the others are within tags?

No, within tags is okay apart from the anchor text i.e. within <a> tags.

So NO replace inside ANY HTML e.g. <span style=“challenge”>
And NO replace within (between) <a> </a> tags regardless of the appearance of more tags inside the anchor.

Would you be open to perhaps looking at non-regex ways of doing what you want? It would mean learning new things but should make this particular task much easier, not to mention any future times where you need to parse and manipulate HTML.

If that sounds OK then take a look at DOM.