SitePoint Sponsor

User Tag List

Results 1 to 5 of 5
  1. #1
    SitePoint Enthusiast
    Join Date
    Nov 2005
    Posts
    33
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Regex to replace li tags with asterisk

    Working with TinyMCE to enable editor to toggle off html mode, what I'm struggling with is converting list items into asterisks:

    <ul>
    <li>Bullet 1</li>
    <li>Bullet 2</li>
    <li>Bullet 3</li>
    </ul>

    Should become

    * Bullet 1
    * Bullet 2
    * Bullet 3

    I've used a similar regex to convert paragraphs to "\n$1\n\n" and that is working, but I can't seem to get the regex to work for list items, here's my code:

    Code:
    // replace p tags with line breaks
    strippedValue = strippedValue.replace(/<p>([^<\/p>]*)<\/p>/ig, "\n\n$1\n\n");
    
    alert(strippedValue);
    
    // replace list items with astrisks
    strippedValue = strippedValue.replace(/<li>([^<\/li>]*)<\/li>/ig, "* $1\n");
    
    alert(strippedValue);
    At both alerts, the content remains the same:

    <ul><li>Bullet 1</li><li>Bullet 2</li><li>Bullet 3
    </li></ul>

  2. #2
    SitePoint Evangelist
    Join Date
    Jun 2007
    Location
    North Yorkshire, UK
    Posts
    483
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    <li>([^<\/li>]*)<\/li>
    You are looking for a string that begins with <li> and finishes with </li> and has any characters other than <. /, l, i, > in between. Since the text Bullet contains ls the match is not made and no substitutions are done.

    Try

    Code:
    <li>(.*?)<\/li>

  3. #3
    SitePoint Enthusiast
    Join Date
    Nov 2005
    Posts
    33
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Ah, yes I see the problem that square brackets are evaluating matches against any of the characters within. That greedy .* was dumping all list items onto one line, I've got it working with this:

    Code:
    strippedValue = strippedValue.replace(/<li[^>]*>([^<]*)<\/li>/ig, "* $1\n");
    But its asking for trouble when someone uses < within the list item. Is there a way to use regex to match where as I originally wanted:

    Assign to $1 all characters after <li> and before the next occurrence of </li>, I thought maybe
    Code:
    [^(?:<\/li>)]
    would do it, or maybe
    Code:
    (^<\/li>)
    but the ^ doesn't appear to work within parentheses..

  4. #4
    SitePoint Enthusiast
    Join Date
    Nov 2005
    Posts
    33
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Did a bit more reading and found that (.*?) is not greedy, the problem was caused by the markup having a new line character before the last closing </li> tag. The . operator doesn't match new line breaks, so have updated to common work-around and it works, here's the final code:

    Code:
    strippedValue = strippedValue.replace(/<p[^>]*>([\s\S]*?)<\/p>/ig, "$1\n");
    strippedValue = strippedValue.replace(/<li[^>]*>([\s\S]*?)<\/li>/ig, "* $1\n");
    Thanks for helping, Philip!

  5. #5
    SitePoint Evangelist
    Join Date
    Jun 2007
    Location
    North Yorkshire, UK
    Posts
    483
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    In the same way as you put the i (case insensitive) and g (global) at the end you can also put s (treat as a single line) then matches occur across lines.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •