Replace one occurence of each in an array, but outside tags

I’m looking to run through a string and replace just one occurence of each keyword in an array with the linked keyword.
I also want to avoid replacing any keywords if they appear inside other tags such as <h2>, <h3>, <img>, <a>,<div> etc…

Right now I have:
foreach ($keyword_array as $kw=>$kwlink){ $string = preg_replace('@(?<=\W|^)('.$kw.')(?=\W|$)@i', '<a href="'.$kwlink.'">$1</a>', $string, 1); }

But how do I avoid if inside tags?

So which tags is it allowed to be in and which ones is it not allowed to be in?

<html>
<body>

The string is a content column in a table, so there won’t be any tags it can be inside

So it will be inside the following tags

<html>
<body>
<table>
<tbody>
<tr>

and possibly others just in order to be where you say it is. There is nothing in a web page that is not inside several tags.

Sorry, what I mean is that I am applying the code to the content field, so there’s no other page tags like etc to worry about!

So if there are not any tags why are you asking how to ignore them?

No, it could be inside <h2>, <h3>, <img>, <a>,<div> tags, but not top level HTML tags such as <html>, <body> etc.

OK, let’s just say it can’t be inside any tags if that 's easier?

So now you finally answered my first question - although none of those apart from <img> should ever appear in a table and an <img> would normally fill an entire cell and if one cell in a col;umn is an image then all the others would be as well.

So if you are processing a column of the table that contains text there wouldn’t be any tags in those cells to ship over. At least not any tags that can contain anything that can be misinterpreted as text.

I did actually say that in my first post!

I really don’t think I’ve made myself clear. So I’ll try again.

Let’s just say I have a string of text that can contain tags such as <h2>, <h3>, <img>, <a>,<div>

I want to search for a set of keywords in this text and replace with my keywords links (just one time for each keyword though). My code in the first post does this just fine :smile:

However, the problem I have is that it will find the first instance of a keyword inside e.g. a heading tag:

<h2>....keyword></h2>

I want to skip any first instances of each keyword that my regex finds are inside a tag like this.

I hope that makes it 100% clear?

Has anyopne else got any tips for this?

Putting together good regex can be difficult for even seasoned coders and can get gnarly fast.
I would consider using one of the DOM parsers to whittle the “haystack” down before looking for the “needles”.

As long as your HTML is structured consistently you should be able to gather up the p tag text content and go from there.

Consider replacing all the tags regardless and then go back and manually remove tags from <h2… $tag …/h2> . I should imagine there will not be a lot of $tags to remove.

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.