SitePoint Sponsor

User Tag List

Results 1 to 4 of 4
  1. #1
    SitePoint Member
    Join Date
    May 2007
    Posts
    2
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Regular expression with opening & closing tags

    Hi

    I need some help with building a regular expression in Javascript. The idea is to remove anchor tags pasted from a Word Document.

    The regular expression must take the text example below, remove all the <a name> (opening and closing) tags and only leave <i><b>1 VISION AND MISSION</b></i> (ignore the <b> and <i> for now, I have an expression to replace them with <strong> and <em>)


    <a name="_Toc153934773"></a><a name="_Toc121546821"></a><a name="_Toc121541109"><i><b>1 VISION AND MISSION</b></i></a>


    If I use the following expression, it removes the first 2 tag sets, ie the ones without any text enclosed, but leaves the third one intact
    ie. returns <a name="_Toc121541109"><i><b>1 VISION AND MISSION</b></i></a>

    code = code.replace(/<a name=[^>]*>([^<\/a>])<\/a>/gi,"$1");

    The first and the last parts are fine (green), it's the red bit I'm struggling with

    (/<a name=[^>]*>([^<\/a>])<\/a>/gi,"$1");

    the red part is supposed to find all text excluding </a> and store it in a backreference, which is then used as the replacement string $1.
    To group <\/a> in order to treat it it as a phrase, you should enclose it in (), and that's where my confusion comes in, when does it treat it as a backreference and when as a grouping clause?

    Your help is appreciated
    Tom

  2. #2
    SitePoint Wizard silver trophy kyberfabrikken's Avatar
    Join Date
    Jun 2004
    Location
    Copenhagen, Denmark
    Posts
    6,157
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Code:
    '<a name="_Toc153934773"></a><a name="_Toc121546821"></a><a name="_Toc121541109"><i><b>1 VISION AND MISSION</b></i></a>'.replace(/<a name=[^>]*>(.*?)<\/a>/gi, "$1");
    The question mark makes the match non-greedy (Eg. it matches as little as possible).

    [^<\/a>] won't work as you expect btw. Anything between square brackets, are interpreted as a list of characters -- not a whole string.

  3. #3
    SitePoint Member
    Join Date
    May 2007
    Posts
    2
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Thanks

    Problem solved. Your help is much appreciated

    I think I tried it with (.*), which didn't work.

  4. #4
    SitePoint Wizard silver trophy kyberfabrikken's Avatar
    Join Date
    Jun 2004
    Location
    Copenhagen, Denmark
    Posts
    6,157
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by ccatth View Post
    I think I tried it with (.*), which didn't work.
    That's because .* is greedy, while .*? is non-greedy.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •