Regex to grab subject when wraped in span or not

I am trying to come up with the proper Regex to grab some text, whether it is wrapped in span tags or not. Sometimes it also will have strong tags within the span. I thought I could use this (<span.+|)(Qualifiers)(.+<\/span>|). However, it seems to let other things leak in sometimes. The word Qualifiers in this case is dynamic. I mean provided by the application (changes). This is for JavaScript. I was testing in a program called RegexBuddy. I show a picture of what is wrong. Note RegexBuddy does not escape forwardslahes. Thanks

Sample text

<p>Qualifiers are used to adjust qualities of an object or a variable.</p>
<p>There are two types of qualifiers in C++. CV <span style="color: #3665f3;"><strong>Qualifiers</strong></span> and Storage Duration <span style="color: #3665f3;"><strong>Qualifiers</strong></span>. CV stands for constant and volatile.</p>
<p>CV <span style="color: #3665f3;"><strong>Qualifiers</strong></span></p>
<ul>
<li>const - marks a variable as read-only or immutable.</li>
<li>mutable - is used on data members to make them writable from a const qualified member function.</li>
<li>volatile - mark a variable that may be changed by another process. This is partly deprecated in C++ 20.</li>
</ul>
<p>Storage Duration Qualifiers are used to define the duration or lifetime of a variable. By default, a variable defined within a block has an automatic lifetime.</p>
<p>Storage Duration <span style="color: #3665f3;"><strong>Qualifiers</strong></span></p>
<ul>
<li>static - variables defined to have life <span>beyond</span> the execution of a block. Static variables live for the duration of the program. Commonly used for keeping state between usages between a given function or a method. By default a variable defined outside of any block is static.</li>
<li>register - are variables stored in processor registers. This can make them faster <span style="color: #3665f3;"><strong>Qualifiers</strong></span> and more efficient. This qualifier is taken by the compiler as a suggestion. The compiler may or may not store the variable in a register.</li>
<li>extern variables are defined in a separate translation unit. These are linked with your code with the linker step of the compilation process.</li>
</ul>

Just for starters.

<span[^>]*>.*?<\/span>

It will trip if there are nested spans though. JS is missing balancing groups

Hi, thanks for the reply. The tricky part here is I need to also grab the word Qualifiers when it is not wrapped in a span or strong.

I may have misunderstood

/Qualifiers|<span[^>]*>.*?<\/span>/gm

Test here

Are you saying to ignore spans/strongs that contain ‘Qualifiers’?

That is pretty much almost there. I am saying get the spans or strong only when Qualifiers is there. Also get Qualifers by itself. In the last example it still picks up <span>beyond</span> that is not wanted.

It should find

<span style="color: #3665f3;"><strong>Qualifiers</strong></span>
<span>Qualifiers</span>
<strong>Qualifiers</strong>
Qualifiers

Thanks so much

(<span[^>]*>)?(<strong>)?Qualifiers(<\/strong>)?(<\/span>)?

(based solely on your example. If you need to get more complex than that, break it down into multiple steps.)

1 Like

That is a super string! I like the idea of multiple steps. We use multiple steps because this is not a web application but runs just like JavaScript in a browser. If I add the \b to catch boundaries, it really makes our use of this work well to catch different versions of the word. Thanks so much!

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.