Cross-Site Scripting Attacks (XSS)

Key Takeaways

Cross-site scripting attacks, or XSS, are a common type of code injection attack that occur due to incorrect validation of user data, often inserted via a web form or manipulated hyperlink. This can allow harmful client-side code to be saved on the server or executed within the user’s browser.
XSS attacks can be categorized into two types: non-persistent XSS, where the malicious code is passed through the server and presented to the victim, and persistent XSS, where the harmful code has bypassed validation and is stored in a data store on the website, executing when the information is presented on the site.
Preventing XSS attacks involves never trusting data from the user or third-party sources, validating all data on input, and escaping it on output. This includes implementing data validation, data sanitization, and output escaping measures.
Despite the built-in security measures provided by many PHP frameworks, it is crucial to continuously test your validation code with the most up-to-date XSS test vectors to ensure the code is not still susceptible to XSS attacks.

A cross-site scripting attack is one of the top 5 security attacks carried out on a daily basis across the Internet, and your PHP scripts may not be immune.

Also known as XSS, the attack is basically a type of code injection attack which is made possible by incorrectly validating user data, which usually gets inserted into the page through a web form or using an altered hyperlink. The code injected can be any malicious client-side code, such as JavaScript, VBScript, HTML, CSS, Flash, and others. The code is used to save harmful data on the server or perform a malicious action within the user’s browser.

Unfortunately, cross-site scripting attacks occurs mostly, because developers are failing to deliver secure code. Every PHP programmer has the responsibility to understand how attacks can be carried out against their PHP scripts to exploit possible security vulnerabilities. Reading this article, you’ll find out more about cross-site scripting attacks and how to prevent them in your code.

Learning by Example

Let’s take the following code snippet.

<form action="post.php" method="post">
 <input type="text" name="comment" value="">
 <input type="submit" name="submit" value="Submit">
</form>

Here we have a simple form in which there is a text box for data input and a submit button. Once the form is submitted, it will submit the data to post.php for processing. Let’s say all post.php does is output the data like so:

<?php
echo $_POST["comment"];

Without any filtering, a hacker could submit the following through the form which will generates a popup in the browser with the message “hacked”.

<script>alert("hacked")</script>

This example, despite its being malicious in nature, does not seem to do much harm. But think about what could happen in the JavaScript code was written to steal a user’s cookie and extract sensitive information from it? There are far worse XSS attacks than a simple alert() call.

Cross-site scripting attacks can be grouped in two major categories, based on how they deliver the malicious payload: non-persistent XSS, and persistent XSS. Allow me to discuss each type in detail.

Non-persistent XSS

Also known as reflected XSS attack, meaning that the actual malicious code is not stored on the server but rather gets passed through it and presented to the victim, is the more popular XSS strategy of the two delivery methods. The attack is launched from an external source, such as from an e-mail message or a third-party website.

Here’s an example of a portion of a simple search result script:

<?php
// Get search results based on the query
echo "You searched for: " . $_GET["query"];
// List search results
...

The example can be a very unsecure results page where the search query is displayed back to the user. The problem here is that the $_GET["query"] variable isn’t validated or escaped, therefore an attacker could send the following link to the victim:

http://example.com/search.php?query=<script>alert("hacked")</script>

Without validation, the page would contain:

You searched for: <script>alert("hacked")</script>

Persistent XSS

This type of attack happens when the malicious code has already slipped through the validation process and it is stored in a data store. This could be a comment, log file, notification message, or any other section on the website which required user input at one time. Later, when this particular information is presented on the website, the malicious code gets executed.

Let’s use the following example for a rudimentary file-based comment system. Assuming the same form I presented earlier, let’s say the receiving script simply appends the comment to a data file.

<?php
file_put_contents("comments.txt", $_POST["comment"], FILE_APPEND);

Elsewhere the contents of comments.txt is shown to visitors:

<?php
echo file_get_contents("comments.txt");

When a user submit a comment it gets saved to the data file. Then the entire file (thus the entire series of comments) is displayed to the readership. If malicious code is submitted then it will be saved and displayed as is without any validation or escaping.

Preventing Cross-Site Scripting Attacks

Fortunately, as easily as an XSS attack can carried out against an unprotected website, protecting against them are just as easy. Prevention must always be in your thoughts, though, even before you write a single line of code.

The first rule which needs to be “enforced” in any web environment (be it development, staging, or production) is never trust data coming from the user or from any other third party sources. This can’t be emphasized enough. Every bit of data must be validated on input and escaped on output. This is the golden rule of preventing XSS.

In order to implement solid security measures which prevents XSS attacks, we should be mindful of data validation, data sanitization, and output escaping.

Data Validation

Data validation is the process of ensuring that your application is running with correct data. If your PHP script expects an integer for user input, then any other type of data would be discarded. Every piece of user data must be validated when it is received to ensure it is of the corrected type, and discarded if it doesn’t pass the validation process.

If you wanted to validate a phone number, for example, you would discard any strings containing letters, because a phone number should consist of digits only. You should also take the length of the string into consideration. If you wanted to be more permissive, you could allow a limited set of special characters such as plus, parenthesis, and dashes which are often used in formatting phone numbers specific to your intended locale.

<?php
// validate a US phone number
if (preg_match('/^((1-)?d{3}-)d{3}-d{4}$/', $phone)) {
    echo $phone . " is valid format.";
}

Data Sanitization

Data sanitization focuses on manipulating the data to make sure it is safe by removing any unwanted bits from the data and normalizing it to the correct form. For example, if you are expecting a plain text string as user input, you may want to remove any HTML markup from it.

<?php
// sanitize HTML from the comment
$comment = strip_tags($_POST["comment"]);

Sometimes, data validation and sanitization/normalization can go hand in hand.

<?php
// normalize and validate a US phone number
$phone = preg_replace('/[^d]/', "", $phone);
$len = strlen($phone);
if ($len == 7 || $len == 10 || $len == 11) {
    echo $phone . " is valid format.";
}

Output Escaping

In order to protect the integrity of displayed/output data, you should escape the data when presenting it to the user. This prevents the browser from applying any unintended meaning to any special sequence of characters that may be found.

<?php
// escape output sent to the browser
echo "You searched for: " . htmlspecialchars($_GET["query"]);

All Together Now!

To better understand the three aspects of data processing, let’s take another look at the file-based comment system from earlier and modify it to make sure it’s secure. The potential vulnerabilities in the code stem from the fact that $_POST["comment"] is blindly appended to the comments.txt file which is then displayed directly to the user. To secure it, the $_POST["comment"] value should be validated and sanitized before it is added to the file, and the file’s contents should be escaped when displayed to the user.

<?php
// validate comment
$comment = trim($_POST["comment"]);
if (empty($comment)) {
    exit("must provide a comment");
}
// sanitize comment
$comment = strip_tags($comment);
// comment is now safe for storage
file_put_contents("comments.txt", $comment, FILE_APPEND);
// escape comments before display
$comments = file_get_contents("comments.txt");
echo htmlspecialchars($comments);

The script first validates the incoming comment to make sure a non-zero length string as been provided by the user. After all, a blank comment isn’t very interesting.

Data validation needs to happen within a well defined context, meaning that if I expect an integer back from the user, then I validate it accordingly by converting the data into an integer and handle it as an integer. If this results in invalid data, then simply discard it and let the user know about it.

Then the script sanitizes the comment by removing any HTML tags it may contain.

And finally, the comments are retrieved, filtered, and displayed.

Generally the htmlspecialchars() function is sufficient for filtering output intended for viewing in a browser. If you’re using a character encoding in your web pages other than ISO-8859-1 or UTF-8, though, then you’ll want to use htmlentities(). For more information on the two functions, read their respective write-ups in the official PHP documentation.

Bear in mind that no single solution exists that is 100% secure on a constantly evolving medium like the Web. Test your validation code thoroughly with the most up to date XSS test vectors. Using the test data from the following sources should reveal if your code is still prone to XSS attacks.

RSnake XSS cheatsheet (a pretty comprehensive list of XSS vectors you can use to test your code)
Zend Framework’s XSS test data
XSS cheatsheet (makes use of HTML5 features)

Summary

Hopefully this article gave you a good explanation of what cross-site scripting attacks are and how you can prevent them from happening to your code. Never trust data coming from the user or from any other third party sources. You can protect yourself by validating the incoming values in a well defined context, sanitizing the data to protect your code, and escaping output to protect your users. After you’ve written your code, be sure your efforts work correctly by testing the code as thoroughly as you can.

Image via Inge Schepers / Shutterstock
And if you enjoyed reading this post, you’ll love Learnable; the place to learn fresh skills and techniques from the masters. Members get instant access to all of SitePoint’s ebooks and interactive online courses, like Jump Start PHP.
Comments on this article are closed. Have a question about PHP? Why not ask it on our forums?

Frequently Asked Questions (FAQs) about PHP Security and Cross-Site Scripting Attacks (XSS)

What is the impact of Cross-Site Scripting (XSS) attacks on PHP applications?

Cross-Site Scripting (XSS) attacks can have a significant impact on PHP applications. They can lead to data theft, session hijacking, defacement of websites, and even distribution of malicious code to users. XSS attacks exploit vulnerabilities in web applications to inject malicious scripts, which are then executed by the user’s browser. This can compromise the user’s interaction with the application and potentially expose sensitive information.

How can I identify potential XSS vulnerabilities in my PHP application?

Identifying potential XSS vulnerabilities in your PHP application involves a combination of manual code review and automated testing. Look for areas in your code where user input is directly included in the output without proper sanitization or validation. Automated tools like XSS scanners can also help identify potential vulnerabilities by testing various XSS attack vectors.

What are some common methods used in XSS attacks?

XSS attacks typically involve the injection of malicious scripts into web pages viewed by other users. This can be done through various methods, such as embedding scripts in URL parameters, form inputs, or even in cookies. The malicious script can then perform actions on behalf of the user, such as stealing their session cookies or manipulating web page content.

How can I prevent XSS attacks in my PHP application?

Preventing XSS attacks in your PHP application involves validating and sanitizing user input, encoding output, and using appropriate HTTP headers. Always treat user input as untrusted and validate it against a whitelist of acceptable values. Sanitize input to remove any potentially harmful characters or code. Encode output to ensure that any potentially harmful characters are rendered harmless. Use HTTP headers like Content-Security-Policy to restrict the sources of scripts and other resources.

What is the role of Content-Security-Policy in preventing XSS attacks?

The Content-Security-Policy (CSP) HTTP header plays a crucial role in preventing XSS attacks. It allows you to specify the domains that the browser should consider as valid sources of executable scripts. This means that even if an attacker can inject a script into your web page, the browser will not run it unless the script’s source is whitelisted in your CSP.

What is the difference between Stored XSS and Reflected XSS attacks?

Stored XSS attacks involve the injection of a malicious script that is permanently stored on the target server. The script is then served to users when they view certain pages. On the other hand, Reflected XSS attacks involve the injection of a script through a URL or form input, which is then immediately returned by the server in the response and executed by the user’s browser.

How can I use PHP’s built-in functions to prevent XSS attacks?

PHP provides several built-in functions that can help prevent XSS attacks. For example, the htmlspecialchars() function can be used to encode special characters in user input, rendering potential scripts harmless. The filter_input() function can be used to sanitize user input, removing or encoding harmful characters.

What is the role of HTTPOnly cookies in preventing XSS attacks?

HTTPOnly cookies are a type of cookie that cannot be accessed through client-side scripts. This means that even if an attacker can inject a script into your web page, they cannot use that script to read or modify HTTPOnly cookies. This can help protect sensitive information, such as session identifiers, from being stolen by XSS attacks.

Can XSS attacks be used to bypass CSRF protections?

Yes, XSS attacks can potentially be used to bypass Cross-Site Request Forgery (CSRF) protections. If an attacker can inject a script into your web page, they can use that script to perform actions on behalf of the user, potentially bypassing any CSRF protections you have in place. This is why it’s important to protect against both XSS and CSRF attacks.

Are there any PHP frameworks that provide built-in protection against XSS attacks?

Yes, many PHP frameworks provide built-in protection against XSS attacks. For example, Laravel automatically encodes output to prevent XSS attacks. Other frameworks like Symfony and CodeIgniter also provide features for sanitizing user input and encoding output. However, it’s important to remember that no framework can provide complete protection, and you should still follow best practices for preventing XSS attacks.