SitePoint Sponsor

User Tag List

Results 1 to 13 of 13
  1. #1
    SitePoint Evangelist winterheat's Avatar
    Join Date
    Aug 2007
    Posts
    508
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    to prevent XSS attacks, using inline onclick="..." is trickier...

    it seems that to prevent XSS attacks (Cross site scripting), using

    <a href="#" onclick="change('<?= htmlspecialchars($title) ?>'); return false;">Click me</a>

    is trickier. The reason is that if $title has something malicious, and htmlspecialchars() changes that to &lt; But then, since the value part of onclick is parsed first as HTML, so it can very well become:

    change('<script> do something bad </script>'); return false;

    (that is, the "<" was changed to &lt; by htmlspecialchars() but changed back to "<" by the browser.

    Now, if change() actually does something like

    document.getElementById('divForTitle').innerHTML = title;

    then the XSS attack can occur.

  2. #2
    SitePoint Wizard bronze trophy
    Join Date
    Jul 2008
    Posts
    5,757
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    htmlspecialchars() is only really suitable for values which will be used as display text, and not in an elements attribute list. It's even more important to validate data there.

  3. #3
    . shoooo... silver trophy logic_earth's Avatar
    Join Date
    Oct 2005
    Location
    CA
    Posts
    9,013
    Mentioned
    8 Post(s)
    Tagged
    0 Thread(s)
    Logic without the fatal effects.
    All code snippets are licensed under WTFPL.


  4. #4
    SitePoint Evangelist winterheat's Avatar
    Join Date
    Aug 2007
    Posts
    508
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by crmalibu View Post
    htmlspecialchars() is only really suitable for values which will be used as display text, and not in an elements attribute list. It's even more important to validate data there.

    what if i change the code to:

    HTML Code:
    <a href="#" id="link1">Click me</a>
    
    <script>
    
    document.getElementById("link1").onclick = function() {
        change("<?= htmlspecialchars($title) ?>"); 
        return false;
    }
    
    </script>

  5. #5
    SitePoint Wizard bronze trophy
    Join Date
    Jul 2008
    Posts
    5,757
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    json_encode() would be a more suitable way to prep a value known to be a string in your latest example. I would still validate. Even something as simple as enforcing a maximum string length can go a long way, and imo should generally be included as part of a standard practice.

    There's a lot of info on xss available on the net. I'd recomend reading about it a bit. It will give you an idea of how some of the more difficult to defend against attacks work. It's hard to defend against that which you don't understand.

  6. #6
    SitePoint Evangelist winterheat's Avatar
    Join Date
    Aug 2007
    Posts
    508
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by crmalibu View Post
    json_encode() would be a more suitable way to prep a value known to be a string in your latest example. I would still validate. Even something as simple as enforcing a maximum string length can go a long way, and imo should generally be included as part of a standard practice.
    a friend of mine suggested "double escape"...
    seems like lazy but it should work:

    <a href="#" onclick="change('<?= htmlspecialchars(htmlspecialchars($title)) ?>'); return false;">Click me</a>

  7. #7
    SitePoint Wizard
    Join Date
    Mar 2008
    Posts
    1,149
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The proper way is to first escape for HTML, then for a JavaScript string, and then to further escape it for HTML.

    First:
    PHP Code:
    $str htmlspecialchars($strENT_QUOTES); 
    Second:
    PHP Code:
    $str preg_replace("/['\"\\\\]/""\\\\\\0"$str);
    $str str_replace("\r""\\r"$str);
    $str str_replace("\n""\\n"$str);
    $str str_replace("\0""\\0"$str);
    $str str_replace("\x08""\\x08"$str); 
    Third:
    PHP Code:
    $str htmlspecialchars($strENT_QUOTES); 
    And there you have it.

    OR:

    You can forget doing #1 and use JavaScript instead to escape the HTML:

    Code js:
    str = str.replace(/</g, '&lt;');
    str = str.replace(/>/g, '&gt;');
    str = str.replace(/&/g, '&amp;');
    str = str.replace(/'/g, '&' + '#39;'); // Had to add ' + ' due to a vBulletin bug / poor design decision that's munging the text
    str = str.replace(/"/g, '&quot;');

    By the way, what encoding are you using for that page?

  8. #8
    SitePoint Wizard silver trophybronze trophy Cups's Avatar
    Join Date
    Oct 2006
    Location
    France, deep rural.
    Posts
    6,869
    Mentioned
    17 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by crmalibu View Post
    Even something as simple as enforcing a maximum string length can go a long way, and imo should generally be included as part of a standard practice.
    I agree stongly.

    What's more maybe $title can be constrained to only contain a small list of char types.

    Then you are left wondering shouldn't $title be filtered first, the problem is of course that you cannot always be sure where $title is coming from.

    (OK, maybe it came out of the database, but was it filtered before I put it in?)

    Good thread.

  9. #9
    SitePoint Evangelist winterheat's Avatar
    Join Date
    Aug 2007
    Posts
    508
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by crmalibu View Post
    json_encode() would be a more suitable way to prep a value known to be a string in your latest example. I would still validate. Even something as simple as enforcing a maximum string length can go a long way, and imo should generally be included as part of a standard practice.

    There's a lot of info on xss available on the net. I'd recomend reading about it a bit. It will give you an idea of how some of the more difficult to defend against attacks work. It's hard to defend against that which you don't understand.
    seems like json_encode() can be a good way since it is to represent the data in javascript.

    will the following way be a good way:

    Code HTML4Strict:
    <a href="#" id="link1">Click me</a>
     
    <script>
     
    document.getElementById("link1").onclick = function() {
        change(<?= json_encode($title) ?>); 
        return false;
    }
     
    </script>

  10. #10
    SitePoint Evangelist winterheat's Avatar
    Join Date
    Aug 2007
    Posts
    508
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    so i think the following code:

    Code PHP:
    var title = "<?= htmlspecialchars($title) ?>";

    is not so good. A good way is:


    Code PHP:
    // note: don't use double quote here
    var title = <?= json_encode($title) ?>;


    and even if title contain single quote, double quote, or newline character, json_encode() handles it well....

    is this the best way? thanks for commenting on this. thanks sk89q, i will look into your method too... trying to figure why it is done in 3 steps... the page usually is in UTF-8. thanks.
    Last edited by winterheat; Apr 7, 2009 at 15:11.

  11. #11
    SitePoint Wizard
    Join Date
    Mar 2008
    Posts
    1,149
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Well, ignoring step #1, you have data in a JavaScript string in an HTML attribute. To make a valid string in JavaScript, you surround it with quote characters and then escape the special characters (\, ", ', etc.). So to make a valid JavaScript string in this case, we have to do step #2. To make a valid attribute value in HTML, you need to escape some special characters too. So after you have made a valid JavaScript string, you now have to escape it for HTML!

    And we do step #1 because after you get the data into the page, you want to set some part of the HTML with the raw text you had originally. Because we don't want this piece of text to be parsed as HTML in change(), you need to escape it. It makes more sense if you do this as the last step, but then you would have to escape it using JavaScript (i.e. the alternate method I posted).

    ---

    If you still don't understand, try this explanation:

    PHP Code:
    $some_string "He said \"how are you?\""// 1
    echo $some_string "\n"// 2
    echo "<script>alert(\"$some_string\")</script>\n"// 3 
    Line 2 outputs:
    He said "how are you?"
    You got past the PHP parser by escaping the right characters.

    Line 3 outputs:
    <script>alert("He said "how are you?"")</script>
    See how we have a problem? We also need to escape it for JavaScript. You can use json_encode for this or the code I wrote for step #2.

    There's two steps right there:
    PHP Code:
    $some_string "He said \"how are you?\""// 1
    echo $some_string "\n"// 2
    echo "<script>alert(\"" escape_js($some_string) . "\")</script>\n"// 3
    // escape_js = code I wrote above for step #2 as a function 
    ---

    By the way, json_encode can be used in place of the code that I wrote for #2, but you should cast the input to a string:
    PHP Code:
    $str json_encode((string) $str); 
    (Although step #1 already munged the data into a string anyway.)

  12. #12
    SitePoint Evangelist winterheat's Avatar
    Join Date
    Aug 2007
    Posts
    508
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    that's interesting... i just did a test using

    Code HTML4Strict:
    <script>
     
    	abc = 123;
    	foo = '<script> alert("hello world"); ' + '<' + '/script>';
    	document.getElementById('divContent').innerHTML = abc + foo;
     
    </script>

    and the alert didn't pop up... so i thought when we set the innerHTML, the <script> portion also gets interpreted and executed?

    I used '<' + '/script>' so that it won't be taken to be a </script> by the HTML parser... any other way to work around it?

  13. #13
    SitePoint Wizard
    Join Date
    Mar 2008
    Posts
    1,149
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Adding script tags using innerHTML won't work with most browsers.

    As for </script>, there is no way around it in HTML. It's by design:
    http://www.w3.org/TR/REC-html40/types.html#type-cdata

    You can use XHTML if you want to "cleanly" add </script> because XHTML is much more structured.


Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •