Sanitizing, Escaping and Validating Data in WordPress
When creating WordPress plugins and themes, which will be used across thousands of websites, you need to be cautious about how to handle both the data coming into WordPress, and the data that is being presented to the user.
In this tutorial, we are going to look at the native functions that can secure, clean and check data that is coming in or going out of WordPress. This is necessary when creating a settings page, HTML form, manipulating shortcodes, and so on.
What is Sanitizing?
In a nutshell, sanitizing is cleaning user input. It is the process of removing text, characters or code from input that is not allowed.
Gmail Example: Gmail removes <style>
tags and their contents from HTML email messages before they are displayed on the Gmail browser client. This is done to prevent email CSS from overwriting Gmail styles.
WordPress Example: Widget titles cannot have HTML tags in them. If you put HTML tags in them, then they are automatically removed before the title is saved.
There are various functions provided by WordPress to sanitize different data into different forms. Here are some of them:
sanitize_email()
This function strips out all characters that are not allowed in an email address. Code example:
<?php
echo sanitize_email("narayan prusty@sitepoint.com"); //Output "narayanprusty@sitepoint.com"
Email address don’t allow whitespace characters. Therefore, the whitespace characters were removed from my email address.
sanitize_file_name()
This function strips characters from a filename that can cause issues while referencing the file in the command line. This function is used by WordPress Media Uploader to sanitize media file names. Code example:
<?php
echo sanitize_file_name("_profile pic--1_.png"); //Output "profile-pic-1_.png"
Here, the underscore at the beginning of the name was removed and double dashes were replaced by one single dash. And, finally, whitespace was replaced by a single dash.
sanitize_key()
Options, Meta Data and Transients Keys can only have lowercase alphanumeric characters, dashes and underscores. This function is used to sanitize the keys. Code example:
<?php
echo sanitize_key("http://SitePoint.com"); //Output "httpsitepointcom"
Here, uppercase characters were converted to lowercase characters and other invalid characters were removed.
sanitize_text_field()
This function removes invalid UTF-8 characters, converts HTML specific characters to entities, strips all tags, and removes line breaks, tabs and extra whitespace, strip octets.
WordPress uses this to sanitize widget titles.
<?php
echo sanitize_text_field("<b>Bold<</b>"); //Output "Bold<"
sanitize_title()
This function removes PHP and HTML tags from a string, as well as removing accents. Whitespace characters are converted to dashes.
Note: This function is not used to sanitize titles. For sanitizing titles, you need to use sanitize_text_field
. This function is used by WordPress to generate the slug for the posts/pages from the post/page title. Code example:
<?php
echo sanitize_title("Sanítizing, Escaping and Validating Data in WordPress"); //Output "sanitizing-escaping-and-validating-data-in-wordpress"
Here the í
character was converted to i
and whitespaces were replaced with the -
character. And, finally, uppercase characters were converted to lowercase characters.
What is Escaping?
In a nutshell, escaping is securing output. This is done to prevent XSS attack and also to make sure that the data is displayed the way the user expects it to be.
Escaping converts the special HTML characters to HTML entities so that they are displayed, instead of being executed.
Example: Facebook escapes the chat messages while displaying them. To make sure that users don’t run code on each other’s computer.
WordPress provides some functions to escape different varieties of data.
esc_html()
This functions escapes HTML specific characters. Example code:
<?php
echo esc_html("<html>HTML</html>"); //Output "<html>HTML</html>"
esc_textarea()
Use esc_textarea()
instead of esc_html()
while displays text in textarea. Because esc_textarea()
can double encode entities.
esc_attr()
This function encodes the <
,>
, &
, "
and '
characters. It will never double encode entities. This function is used to escape the value of HTML tags attributes.
<?php
echo esc_html("<html>HTML</html>"); //Output "<html>HTML</html>"
esc_url()
URLs can also contain JavaScript code in them. So, if you want to display a URL or a complete <a>
tag, then you should escape the href
attribute or else it can cause an XSS attack.
<?php
$url = "javascript:alert('Hello')";
?>
<a href="<?php echo esc_url($url);?>">Text</a>
esc_url_raw()
This is used if you want to store a URL in a database or use in URL redirecting. The difference between esc_url
and esc_url_raw
is that esc_url_raw
doesn’t replace ampersands and single quotes.
antispambot()
There are lots of email bots, which are constantly looking for email addresses. We may want to display the email address to the users, but not want it to be recognised by email bots. antispambot
allows us to do that exactly.
antispambot
converts email address characters to HTML entities to block spam bots. Example code:
<?php
echo antispambot("narayanprusty@sitepoint.com"); //Output "narayanprusty@sitepoint.com"
What is Validating?
In a nutshell, validating is checking user input. This is done to check if the user has entered a valid value.
If data is not valid, then it is not processed or stored. The user is asked to enter the value again.
Example: While creating an account on a site, we are asked to enter the password twice. Both the passwords are validated; they are checked to confirm whether they both are same or not.
You shouldn’t rely on HTML5 validation as it can be easily bypassed. Server side validation is required before processing or storing specific data.
WordPress provides a couple of functions to validate only some types of data. Developers usually define their own functions for validate data. Let’s see some WordPress provided validation functions:
is_email()
Email validation is required while submitting comments, contact forms, and creating an account. is_email()
function is provided by WordPress to check if a given is an email address or not. Code example:
<?php
if(is_email("narayanprusty@sitepoint.com"))
{
echo "Valid Email";
}
else
{
echo "Invalid Email";
}
is_serialized()
is_serialized()
checks if the passed data is string or not. WordPress uses this function while storing options, meta data and transients. If value associated with a key is not a string then WordPress serializes it before storing in database.
Here is example code on how you can use it:
<?php
$data = array("a", "b", "c");
//while storing
if(!is_serialized($data))
{
//serialize it
$data = maybe_serialize($data);
//or else ask user to re-input the data
}
//while displaying
echo maybe_unserialize($data);
Conclusion
We saw what sanitizing, validating and escaping are, and why it is important for every developer to know the functions associated with them. You can find more reading on the topic at the Data Validation Codex page on WordPress.org. It is always a good idea to include these functions when developing a WordPress theme or plugin. Unfortunately, quite a lot of plugins are poorly developed, and do not escape the output. The result is that they make the website open to potential XSS attacks. Please feel free to include any comments or helpful tips in the section below.