Cross-Site Scripting Attacks (XSS)

Tweet

A cross-site scripting attack is one of the top 5 security attacks carried out on a daily basis across the Internet, and your PHP scripts may not be immune.

Also known as XSS, the attack is basically a type of code injection attack which is made possible by incorrectly validating user data, which usually gets inserted into the page through a web form or using an altered hyperlink. The code injected can be any malicious client-side code, such as JavaScript, VBScript, HTML, CSS, Flash, and others. The code is used to save harmful data on the server or perform a malicious action within the user’s browser.

Unfortunately, cross-site scripting attacks occurs mostly, because developers are failing to deliver secure code. Every PHP programmer has the responsibility to understand how attacks can be carried out against their PHP scripts to exploit possible security vulnerabilities. Reading this article, you’ll find out more about cross-site scripting attacks and how to prevent them in your code.

Learning by Example

Let’s take the following code snippet.

<form action="post.php" method="post">
 <input type="text" name="comment" value="">
 <input type="submit" name="submit" value="Submit">
</form>

Here we have a simple form in which there is a text box for data input and a submit button. Once the form is submitted, it will submit the data to post.php for processing. Let’s say all post.php does is output the data like so:

<?php
echo $_POST["comment"];

Without any filtering, a hacker could submit the following through the form which will generates a popup in the browser with the message “hacked”.

<script>alert("hacked")</script>

This example, despite its being malicious in nature, does not seem to do much harm. But think about what could happen in the JavaScript code was written to steal a user’s cookie and extract sensitive information from it? There are far worse XSS attacks than a simple alert() call.

Cross-site scripting attacks can be grouped in two major categories, based on how they deliver the malicious payload: non-persistent XSS, and persistent XSS. Allow me to discuss each type in detail.

Non-persistent XSS

Also known as reflected XSS attack, meaning that the actual malicious code is not stored on the server but rather gets passed through it and presented to the victim, is the more popular XSS strategy of the two delivery methods. The attack is launched from an external source, such as from an e-mail message or a third-party website.

Here’s an example of a portion of a simple search result script:

<?php
// Get search results based on the query
echo "You searched for: " . $_GET["query"];

// List search results
...

The example can be a very unsecure results page where the search query is displayed back to the user. The problem here is that the $_GET["query"] variable isn’t validated or escaped, therefore an attacker could send the following link to the victim:

http://example.com/search.php?query=<script>alert("hacked")</script>

Without validation, the page would contain:

You searched for: <script>alert("hacked")</script>

Persistent XSS

This type of attack happens when the malicious code has already slipped through the validation process and it is stored in a data store. This could be a comment, log file, notification message, or any other section on the website which required user input at one time. Later, when this particular information is presented on the website, the malicious code gets executed.

Let’s use the following example for a rudimentary file-based comment system. Assuming the same form I presented earlier, let’s say the receiving script simply appends the comment to a data file.

<?php
file_put_contents("comments.txt", $_POST["comment"], FILE_APPEND);

Elsewhere the contents of comments.txt is shown to visitors:

<?php
echo file_get_contents("comments.txt");

When a user submit a comment it gets saved to the data file. Then the entire file (thus the entire series of comments) is displayed to the readership. If malicious code is submitted then it will be saved and displayed as is without any validation or escaping.

Preventing Cross-Site Scripting Attacks

Fortunately, as easily as an XSS attack can carried out against an unprotected website, protecting against them are just as easy. Prevention must always be in your thoughts, though, even before you write a single line of code.

The first rule which needs to be “enforced” in any web environment (be it development, staging, or production) is never trust data coming from the user or from any other third party sources. This can’t be emphasized enough. Every bit of data must be validated on input and escaped on output. This is the golden rule of preventing XSS.

In order to implement solid security measures which prevents XSS attacks, we should be mindful of data validation, data sanitization, and output escaping.

Data Validation

Data validation is the process of ensuring that your application is running with correct data. If your PHP script expects an integer for user input, then any other type of data would be discarded. Every piece of user data must be validated when it is received to ensure it is of the corrected type, and discarded if it doesn’t pass the validation process.

If you wanted to validate a phone number, for example, you would discard any strings containing letters, because a phone number should consist of digits only. You should also take the length of the string into consideration. If you wanted to be more permissive, you could allow a limited set of special characters such as plus, parenthesis, and dashes which are often used in formatting phone numbers specific to your intended locale.

<?php
// validate a US phone number
if (preg_match('/^((1-)?d{3}-)d{3}-d{4}$/', $phone)) {
    echo $phone . " is valid format.";
}

Data Sanitization

Data sanitization focuses on manipulating the data to make sure it is safe by removing any unwanted bits from the data and normalizing it to the correct form. For example, if you are expecting a plain text string as user input, you may want to remove any HTML markup from it.

<?php
// sanitize HTML from the comment
$comment = strip_tags($_POST["comment"]);

Sometimes, data validation and sanitization/normalization can go hand in hand.

<?php
// normalize and validate a US phone number
$phone = preg_replace('/[^d]/', "", $phone);
$len = strlen($phone);
if ($len == 7 || $len == 10 || $len == 11) {
    echo $phone . " is valid format.";
}

Output Escaping

In order to protect the integrity of displayed/output data, you should escape the data when presenting it to the user. This prevents the browser from applying any unintended meaning to any special sequence of characters that may be found.

<?php
// escape output sent to the browser
echo "You searched for: " . htmlspecialchars($_GET["query"]);

All Together Now!

To better understand the three aspects of data processing, let’s take another look at the file-based comment system from earlier and modify it to make sure it’s secure. The potential vulnerabilities in the code stem from the fact that $_POST["comment"] is blindly appended to the comments.txt file which is then displayed directly to the user. To secure it, the $_POST["comment"] value should be validated and sanitized before it is added to the file, and the file’s contents should be escaped when displayed to the user.

<?php
// validate comment
$comment = trim($_POST["comment"]);
if (empty($comment)) {
    exit("must provide a comment");
}

// sanitize comment
$comment = strip_tags($comment);

// comment is now safe for storage
file_put_contents("comments.txt", $comment, FILE_APPEND);

// escape comments before display
$comments = file_get_contents("comments.txt");
echo htmlspecialchars($comments);

The script first validates the incoming comment to make sure a non-zero length string as been provided by the user. After all, a blank comment isn’t very interesting.

Data validation needs to happen within a well defined context, meaning that if I expect an integer back from the user, then I validate it accordingly by converting the data into an integer and handle it as an integer. If this results in invalid data, then simply discard it and let the user know about it.

Then the script sanitizes the comment by removing any HTML tags it may contain.

And finally, the comments are retrieved, filtered, and displayed.

Generally the htmlspecialchars() function is sufficient for filtering output intended for viewing in a browser. If you’re using a character encoding in your web pages other than ISO-8859-1 or UTF-8, though, then you’ll want to use htmlentities(). For more information on the two functions, read their respective write-ups in the official PHP documentation.

Bear in mind that no single solution exists that is 100% secure on a constantly evolving medium like the Web. Test your validation code thoroughly with the most up to date XSS test vectors. Using the test data from the following sources should reveal if your code is still prone to XSS attacks.

Summary

Hopefully this article gave you a good explanation of what cross-site scripting attacks are and how you can prevent them from happening to your code. Never trust data coming from the user or from any other third party sources. You can protect yourself by validating the incoming values in a well defined context, sanitizing the data to protect your code, and escaping output to protect your users. After you’ve written your code, be sure your efforts work correctly by testing the code as thoroughly as you can.

Image via Inge Schepers / Shutterstock

And if you enjoyed reading this post, you’ll love Learnable; the place to learn fresh skills and techniques from the masters. Members get instant access to all of SitePoint’s ebooks and interactive online courses, like Jump Start PHP.

Comments on this article are closed. Have a question about PHP? Why not ask it on our forums?

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • Anonymous

    Line 09 in the final example is wrong. The htmlspecialchars indicates that you want to display text “as-is” in an HTML context and have special characters displayed, not interpreted by the browser.

    Your snippet would prevent anyone from talking _about_ HTML elements (or stuff that looks like it). Users cannot use angle brackets directly, nor can they use entities (those will be escaped again). Choose an input and output format and stick to it; Don’t mix contexts. So, no strip_tags here.

    Your example also assumes that webpage character encoding and the htmlspecialchars default encoding match.

  • http://www.chaoscontrol.org Chris

    Dude… It’s just example code ;)
    Nice XSS introduction though.

    • http://blah Wolf_22

      >>Dude… It’s just example code ;)

      No, it’s not “just example code.” It’s learning material that can potentially be used by thousands of web developers across the world. That’s not the kind of audience you want to be giving deficient examples to.

      • doub1ejack

        Dude. Really. That’s just sample code.

        • Tom

          Yes, it’s just example code, and a very valuable article that has helped me as I’m learning security, but without the guy’s comment above explaining the reason not to mix htmlspecialchars with strip_tags I would’ve missed a very valuable point. What is the point of putting up “example code” (with the purpose of teaching people something) if what you’re teaching is wrong, and then when someone corrects it, chiming in to say, “Hey, it’s just example code.” That’s kind of like explaining to someone how to bake a cake, and telling them to put the wrong ingredients in it so it comes out tasting terrible, and then when someone calls you on it, saying, “Hey, I was just giving them an example.”

      • hidden evil

        its because of these tutorials not explained correctly PHP gurus still have a hard time explaining to the other communities….there is nothing like sample code or example code plz…if you intend to write, plz do R&D and then spit out…

  • http://www.omaroid.com Omar Abdallah

    according to the manual: ISO-8859-1, ISO-8859-15, UTF-8, cp866, cp1251, cp1252, and KOI8-R are effectively equivalent for htmlspecialchars().
    however its recommended to set the encoding.

  • http://zaemis.blogspot.com Timothy Boronczyk

    Yes, it is recommended that you set the encoding of your pages. You can either do so with an HTTP Content-Type header or a <meta> tag. But I believe if either are missing, UTF-8 will generally be assumed. And in all honesty, UTF-8 *should* be used. So htmlspecialchars() will be sufficient for at least 99.9% of developers. For the remaining developers, the article mirrors the official PHP documentation in stating that if you’re using an encoding other than UTF-8 or ISO-8859-1 then you’ll want to use htmlentities().

  • Alex Gervasio

    George,
    I don’t want to sound too picky, as I’m assuming you did it that way just to keep the examples easier to follow, which is understandable. Even so, it’s fair to point out that “comments.txt” is a writable file exposed to the outside world and accessible via the browser. In all cases, it should be moved out of the web access path. As you know, there’s plenty of nifty options to achieve this.
    Nice writeup, though.

    • question

      Would it be good enough to write a .htaccess file along with the comments.txt with a deny form all statement?
      I often wonder what the risk are invovled in Flat File storage. Also this is a good talk about XXS but this is also very general knowledge and one of the first things I read on php.net’s website when I started learning about PHP. That said good intro and you did point to the same resources I use for testing against XSS. Just wish you could have gone into some more advanced stuff like how to secure input that allows HTML etc… CSRF with hidden inputs etc….

  • http://www.primalskill.com George Fekete

    Hey Alex,
    You’re right, I did make the examples super easy to focus only on XSS. Making the file secure by restricting access / moving out of the document root is out of scope.

  • Vin

    Just a thought, is strip_tags() alone all that good? I usually clean the user input by passing them through a small function
    $cleanval=trim($userval);
    $cleanval=strip_tags($cleanval);
    $cleanval=stripslashes($cleanval);
    $cleanval=mysql_real_escape_string($cleanval);
    $cleanval=htmlspecialchars($cleanval);

    • codeguy

      I could be wrong, but I believe that the line
      $cleanval=mysql_real_escape_string($cleanval);
      will only work if a mysql connection has already been established.

  • http://primalskill.com George Fekete

    As I wrote in the article, it depends on the context you sanitizing the user input. If you’re not interested in preserving HTML or any other tags, then strip_tags does the job.

  • https://twitter.com/#!/p_kavanagh Philip Kavanagh

    There is no all-in-one solution to XSS. It really depends on the data you are trying to output. XSS should be treated before output not at input. Consider a CMS where an admin needs to add a snippet of CSS/JS. These need to be inserted/edited in their raw format. htmlspecialchars($var, ENT_QUOTES, ‘UTF-8′) should be used at the very basic XSS protection

  • http://www.frankforte.ca/blog/ @FrankForte

    I recently learned more about csrf (cross site request forgery) which seems as dangerous.

    the basic concern:
    Log into one website, call it bank.com and sign in
    Open another tab or click a link to a second website. If that second website has a malicious image or Javascript, it can post to bank.com using your cookie (since you are still logged in, the website might accept the post unless other security measures prevent this.)

    Make sure you set cookies to http only!

  • http://www.frankforte.ca/blog/ @FrankForte

    @ Philip, what if you store data as a serialized object then output as json for use in a html app? Escaping output becomes difficult… if you know data is only used in html, escaping on all user input can be a global safety net where output filtering might miss something. I agree escaping on output is better, but some cases it is not as practical.

  • Peter

    To display html code you can transform the to > and <
    Furthermore, there is always a chance you missed a value to sanatize. I am not a fan of Joomla CMS, but they transform al sanaitized user input to a safe array: $IN['your_values_can_be_a_multi_array_too']. This counts for all user input like $_POST,$_GET,$_COOOKIE and so on.
    So, using only userinput from $IN you wont’t forget anything:)
    Just sanatize the right way, before putting it in $IN array.

  • http://www.deathshadow.com Jason Knight

    It is a laugh how 99.99% of such exploits can be shut down by simply using prepared queries (so data is auto-sanitized) and htmlspecialchars on output… even more laughable how few people seem to know that.

    • http://primalskill.com George

      I’m with you on this, but bare in mind that you’re talking about two different things. Prepared statements are used by SQL when you want to save your data in a database.

      XSS attacks are not restricted to database only.

  • Jon

    Very good article. Thank you for helping me to better understanding XSS attacks and how to write better code to prevent it.

  • gern

    um…I think it’s “bear in mind” not “bare in mind”. :)

    • http://zaemis.blogspot.com Timothy Boronczyk

      Fixed… thanks!

  • moi

    Hello
    I submitted a forum and was redirected to a page saying xss attacked with my ip address
    What should I do now?

  • Garbage In, Garbage Out

    I would have referred others here if only you hadn’t advocated the use of strip_tags. Instead I have to add this article to the junk pile. The solution you promote would prevent anyone from talking about or citation styles .

  • PeeceBabs

    Hello. Do anyone know what is all about this cookie acceptation thing? Is it safe?

    Thanks for answer

  • Sevar

    A good article. However, I did not like the fact that you used only preg_replace() and preg_match() functions on a user inputted data and then echoed it out. I understand that this will work in the case you mentioned, but it is a bad practice to only use validation before echoing out user-inputted data. Think about how easy it is for people to mess up their regular expressions!
    Also you used htmlspecialchars($comments), again a very bad practice, you should always use htmlspecialchars($comments, ENT_QUOTES, ‘UTF-8′);
    However, it is a good article on it self but not a good explanation of XSS. I highly recommend you use this article on XSS: http://www.sunnytuts.com/article/preventing-cross-site-scripting-xss

  • http://techawake.com Mohammad

    Very useful. tnx