Searched but haven’t found a solution to this.
I want to remove everything from html code that is not a <div> or </div> tag (opening or closing).
Since this matches the divs:
<div.*?>|</div>
I thought I could just negate it somehow, such as:
~ - Start regex </ - match </ literally COLOR=“Blue”[/COLOR] - Negative lookahead for the literal string div .*? - match anything, lazyly. Shouldn’t be needed here, but without it the regex doesn’t work !? > - match > literally | - OR match the following: < - match < literally COLOR=“Blue”[/COLOR] - Negative lookahead for the literal string / COLOR=“Blue”[/COLOR] - Negative lookahead for the literal string div .*? -match anything, lazyly. > - match > literally ~ - End regex is - Modifiers: Case Insensitive (i) and Single Line mode (s)
Single line mode is to also remove HTML that spans multiple lines, like
How I understood it is that the OP wished to remove all tags except for div tags, thus leaving everything outside tags (content) and div tags in tact. Which is exactly what my regex provided in post #3 does
Thanks ScallioXTX! That’s pretty much what I was after. And thanks for the detailed explanation. I remember look-ahead now, but it’s been a while. Thanks also to the other comments.
I was, in fact, trying to get a string containing only div tags (as mentioned by philip). The reason for this is that when examining (e.g. wordpress) generated pages it can be useful to have a skeleton outline of the (potentially bloated) div structure. This can be done by hand, of course, but seems to be against the spirit of computing
Since the tags contain id and class properties, which are useful to know, combining the regex from Scallio with the following gives a visual guide viewable in a browser, showing the nesting and naming of each div without other clutter:
Is this the story where someone asks how to move a mountain because they want to lay a pipeline from point A to point B?
Possibly :). Although I knew it would be relatively straightforward to some. I know there are various tools for examining source code, but this seems like a fair use of regexps and can be done in a text editor.
They can be, just be careful. Regular expressions work on regular languages. HTML isn’t a regular language. Meaning, for small things, a regex will be fine, but when there’s complicated nesting and possibly strange content floating around, you’ll want to check by hand afterwards if it matters.
I actually find that the hierarchical HTML view shown when you use the “Inspect element” contextual menu option, in for example Chrome and Firefox, are invaluable for this.