Regular Expressions - Remove some items

I have a document that contains 100’s of links. But they also placed the actual link in the document as: (not using the real domain name in example)

<a href=""></a><em>[link to&nbsp;:</em><em>]

So I want to a Regular Expression that will will remove the [link to :] from the code. Plus there is a series of those empty <em> tags - I would the new code to look like this:

<a href="" title="Link to:"></a>

I’d be even happy to just have the [link to :] removed as regular expression… I can remove all empty tags after…


This handles the string you gave. I am a bit concerned about the <em> tags essentially because there is no closing tag to the second <em>.


use strict;
my $a = '<a href=""></a><em>[link to&nbsp;:</em><em>]';

# Look for an anchor tag - add title
$a =~ s/(<a [^>]+)/$1 title="Link to:"/sg;

# Look for [link to (may be preceeded by <em>
$a =~ s/(<em>)?\\[link to[^\\]]+\\]//sg;

print $a;