SitePoint Sponsor

User Tag List

Results 1 to 2 of 2
  1. #1
    SitePoint Member
    Join Date
    Nov 2010
    Posts
    1
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Unhappy Regular Expressions - Remove some items....

    I have a document that contains 100's of links. But they also placed the actual link in the document as: (not using the real domain name in example)

    Code:
    <a href="http://www.demodomain.com/">www.demodomain.com</a><em>[link to&nbsp;:</em><em>http://www.demodomain.com/]
    So I want to a Regular Expression that will will remove the [link to : http://www.demodomain.com/] from the code. Plus there is a series of those empty <em> tags - I would the new code to look like this:

    Code:
    <a href="http://www.demodomain.com/" title="Link to:">www.demodomain.com</a>
    I'd be even happy to just have the [link to : http://www.demodomain.com/] removed as regular expression.. I can remove all empty tags after...

    HELP

  2. #2
    SitePoint Evangelist
    Join Date
    Jun 2007
    Location
    North Yorkshire, UK
    Posts
    483
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    This handles the string you gave. I am a bit concerned about the <em> tags essentially because there is no closing tag to the second <em>.

    Code Perl:
    #!perl.exe
     
    use strict;
    my $a = '<a href="http://www.demodomain.com/">www.demodomain.com</a><em>[link to&nbsp;:</em><em>[url]http://www.demodomain.com/]';[/url]
     
    # Look for an anchor tag - add title
    $a =~ s/(<a [^>]+)/$1 title="Link to:"/sg;
     
    # Look for [link to (may be preceeded by <em>
    $a =~ s/(<em>)?\[link to[^\]]+\]//sg;
     
    print $a;
    exit;


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •