SitePoint Sponsor

User Tag List

Results 1 to 6 of 6

Hybrid View

  1. #1
    SitePoint Member
    Join Date
    Oct 2003
    Location
    bris vegas
    Posts
    16
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Regular Expression - Removing invalidate links

    Hi,

    Have a regular expression that I can't quite get my head around. I need to remove invalidate links like the one below...

    <a href="javascript: alert(\'blah\');" onClick="alert(\'blah\')">Bad Link</a>'

    A validate link would look like this...

    <a href="http://www.hotmail.com">Good Link</a>

    I believe I have the regex for removing validate links:

    str.replace(/<a href="[^">]+">/gi,'')
    // result on validate link: Good Link</a>

    Any help would be much appreciated.

    Cheers,
    Gerard.

  2. #2
    SitePoint Wizard silver trophy
    Join Date
    May 2003
    Posts
    1,843
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    You didn't mention what constitutes an 'invalid' link...is it just one with a javascript&#58; url, or any link with an onclick handler assigned?

    Anyway...
    Code:
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" 
        "http://www.w3.org/TR/html4/loose.dtd">
    
    <html>
    <head>
    <title>untitled</title>
    <script type="text/javascript">
    
    function RBL()
    {
    	var oRegExp = /<a\s+[^>]+javascript&#58;[^>]+>[^<]+<\/a>/gi;
    	var docbody = document.getElementsByTagName('body').item(0);
    	docbody.innerHTML = docbody.innerHTML.replace(oRegExp, '');
    	return false;
    }
    
    </script>
    </head>
    <body>
    <button onclick="return RBL()">remove bad links</button>
    <hr />
    <a href="javascript&#58;alert('blah')" onClick="alert('blah')">Bad Link</a> |
    <a href="http://www.hotmail.com">Good Link</a>
    <br /><br />
    <A HREF="javascript&#58;hoohah()">Bad Link</A> | 
    <A href="http://www.google.com">Good Link</a>
    <br /><br />
    <A OnClick="alert(location.pathname)" HREF="javascript&#58;void print()">Bad Link</A> | 
    <a  href="http://www.sitepoint.com/forums/showthread.php?p=1093268#post1093268">Good Link</a>
    </body>
    </html>
    btw, String.replace() doesn't do the replacement 'in place' - the string you invoke it on remains unchanged. You need to assign the result to something. Hope that's close.
    ::: certified wild guess :::

  3. #3
    SitePoint Member
    Join Date
    Oct 2003
    Location
    bris vegas
    Posts
    16
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    thanks for code, how would you extend it to clean the link tags so that only <a href="url">link</a> remained.

  4. #4
    SitePoint Wizard silver trophy
    Join Date
    May 2003
    Posts
    1,843
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Code:
    
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" 
        "http://www.w3.org/TR/html4/loose.dtd">
    
    <html>
    <head>
    <title>untitled</title>
    <script type="text/javascript">
    
    function RBL()
    {
    	var oRegExp1 = /<a\s+[^>]+javascript&#58;[^>]+>[^<]+<\/a>/gi;
    	var oRegExp2 = /(<a\s*[^>]*)onclick[^ ]+([^>]*>[^<]+<\/a>)/gi;
    	var docbody = document.getElementsByTagName('body').item(0);
    	docbody.innerHTML = docbody.innerHTML.replace(oRegExp1, '').replace(oRegExp2, '$1$2');
    	return false;
    }
    
    </script>
    </head>
    <body>
    <button onclick="return RBL()">remove bad links</button>
    <hr />
    <a href="javascript&#58;alert('blah')" onClick="alert('blah')">Bad Link</a> |
    <a href="http://www.hotmail.com">Good Link</a>
    <br /><br />
    <A HREF="javascript&#58;hoohah()">Bad Link</A> | 
    <A href="http://www.google.com">Good Link</a>
    <br /><br />
    <A OnClick="alert(location.pathname)" HREF="javascript&#58;void print()">Bad Link</A> | 
    <a  href="http://www.sitepoint.com/forums/showthread.php?p=1093268#post1093268">Good Link</a>
    <br /><br />
    <A OnClick="alert(location.pathname)" HREF="http://www.msn.com">Fixable Link</A> | 
    <a href="mysite.net" onClick="alert('blah')">Fixable Link</a>
    </body>
    </html>
    
    This'll probably get tripped up sooner or later; regexes usually need to be fine-tuned to match all possible arrangements...
    ::: certified wild guess :::

  5. #5
    SitePoint Member
    Join Date
    Oct 2003
    Location
    bris vegas
    Posts
    16
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Kewl thanks for code!!

    I'll have to study regex a bit more, ideally I'd like to be able to test for the following format and remove all that didn't comply.

    str.replace(/<a href=" any chars except "> ">link allowed</a>/gi, '');

    I'm actually using it for a serverside ASP script. I think I'll just have to do multiply tests using your code eg: onclick, ondblclick, onmouseover, etc.

    Cheers,
    g.

  6. #6


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •