SitePoint Sponsor

User Tag List

Results 1 to 7 of 7
  1. #1
    SitePoint Member
    Join Date
    Sep 2007
    Posts
    5
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Removing JacaScript event attributes

    Hi,

    I have some code to clean up an HTML document before doing additional processing. One of the steps in cleaning up the HTML document is to remove all JavaScript event attributes from HTML tags (such as onclick, onblur, etc). I have the following code but it seems to have problems when the JavaScript contains a \". I'm not so great with regular expressions so I'm not really sure how to have it exclude the \" sub-pattern. Any help on how to make this regex better would be appropriated!

    PHP Code:
    $html preg_replace('#(onabort|onactivate|onafterprint|onafterupdate|onbeforeactivate|on
    beforecopy|onbeforecut|onbeforedeactivate|onbeforeeditfocus|onbeforepaste|onbefo
    reprint|onbeforeunload|onbeforeupdate|onblur|onbounce|oncellchange|onchange|oncl
    ick|oncontextmenu|oncontrolselect|oncopy|oncut|ondataavaible|ondatasetchanged|on
    datasetcomplete|ondblclick|ondeactivate|ondrag|ondragdrop|ondragend|ondragenter|
    ondragleave|ondragover|ondragstart|ondrop|onerror|onerrorupdate|onfilterupdate|o
    nfinish|onfocus|onfocusin|onfocusout|onhelp|onkeydown|onkeypress|onkeyup|onlayou
    tcomplete|onload|onlosecapture|onmousedown|onmouseenter|onmouseleave|onmousemove
    |onmoveout|onmouseover|onmouseup|onmousewheel|onmove|onmoveend|onmovestart|onpas
    te|onpropertychange|onreadystatechange|onreset|onresize|onresizeend|onresizestar
    t|onrowexit|onrowsdelete|onrowsinserted|onscroll|onselect|onselectionchange|onse
    lectstart|onstart|onstop|onsubmit|onunload)\s*=\s*".*?"#is'
    ''$html); 
    Thanks!

  2. #2
    SitePoint Wizard bronze trophy
    Join Date
    Jul 2008
    Posts
    5,757
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Use htmlpurifier. Trying to parse html and javascript using regular expressions is very difficult to do correctly.

  3. #3
    SitePoint Zealot Ripe's Avatar
    Join Date
    Oct 2006
    Location
    Australia
    Posts
    146
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    This might be helpful.

  4. #4
    I solve practical problems. bronze trophy
    Michael Morris's Avatar
    Join Date
    Jan 2008
    Location
    Knoxville TN
    Posts
    2,034
    Mentioned
    65 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by crmalibu View Post
    Use htmlpurifier. Trying to parse html and javascript using regular expressions is very difficult to do correctly.
    I'll second this with the addendum that if you're filtering input from untrusted sources (guest users) you're probably better off using a bbcode library. Direct attachment of event handlers like this isn't the only way to attach javascript events to an object.

  5. #5
    SitePoint Member
    Join Date
    Sep 2007
    Posts
    5
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Htmlpurifier looks pretty solid. Unfortunatly it only parses anything within the body tags . I'm writing something where I need to parse an entire web page. Basically I want to strip out the javascript in order to clean out some crud before the parser attempts to extract data from the page.

  6. #6
    SitePoint Wizard bronze trophy
    Join Date
    Jul 2008
    Posts
    5,757
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I'm not familiar enough with it to tell you what to do, but I'm sure you just need to tweak it a bit. There's TONS of configuration options and extensibility. They have a forum.

  7. #7
    Unobtrusively zen silver trophybronze trophy
    paul_wilkins's Avatar
    Join Date
    Jan 2007
    Location
    Christchurch, New Zealand
    Posts
    14,716
    Mentioned
    103 Post(s)
    Tagged
    4 Thread(s)
    From the HTML Purifier configuration documentation it appears that the head section is not supported at all.

    From the HTML.allowedElements section:

    Note that this method is subtractive: it does its job by taking away from HTML Purifier usual feature set, so you cannot add a tag that HTML Purifier never supported in the first place (like embed, form or head).
    Programming Group Advisor
    Reference: JavaScript, Quirksmode Validate: HTML Validation, JSLint
    Car is to Carpet as Java is to JavaScript


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •