Working with TinyMCE to enable editor to toggle off html mode, what I’m struggling with is converting list items into asterisks:
<ul>
<li>Bullet 1</li>
<li>Bullet 2</li>
<li>Bullet 3</li>
</ul>
Should become
- Bullet 1
- Bullet 2
- Bullet 3
I’ve used a similar regex to convert paragraphs to "
$1
" and that is working, but I can’t seem to get the regex to work for list items, here’s my code:
// replace p tags with line breaks
strippedValue = strippedValue.replace(/<p>([^<\\/p>]*)<\\/p>/ig, "\
\
$1\
\
");
alert(strippedValue);
// replace list items with astrisks
strippedValue = strippedValue.replace(/<li>([^<\\/li>]*)<\\/li>/ig, "* $1\
");
alert(strippedValue);
At both alerts, the content remains the same:
<ul><li>Bullet 1</li><li>Bullet 2</li><li>Bullet 3
</li></ul>
<li>([^<\/li>]*)<\/li>
You are looking for a string that begins with <li> and finishes with </li> and has any characters other than <. /, l, i, > in between. Since the text Bullet contains ls the match is not made and no substitutions are done.
Try
<li>(.*?)<\\/li>
Ah, yes I see the problem that square brackets are evaluating matches against any of the characters within. That greedy .* was dumping all list items onto one line, I’ve got it working with this:
strippedValue = strippedValue.replace(/<li[^>]*>([^<]*)<\\/li>/ig, "* $1\
");
But its asking for trouble when someone uses < within the list item. Is there a way to use regex to match where as I originally wanted:
Assign to $1 all characters after <li> and before the next occurrence of </li>, I thought maybe
[^(?:<\\/li>)]
would do it, or maybe
(^<\\/li>)
but the ^ doesn’t appear to work within parentheses…
Did a bit more reading and found that (.*?) is not greedy, the problem was caused by the markup having a new line character before the last closing </li> tag. The . operator doesn’t match new line breaks, so have updated to common work-around and it works, here’s the final code:
strippedValue = strippedValue.replace(/<p[^>]*>([\\s\\S]*?)<\\/p>/ig, "$1\
");
strippedValue = strippedValue.replace(/<li[^>]*>([\\s\\S]*?)<\\/li>/ig, "* $1\
");
Thanks for helping, Philip!
In the same way as you put the i (case insensitive) and g (global) at the end you can also put s (treat as a single line) then matches occur across lines.