Input Validation Using Filter Functions

Share this article

I’d like to start off this article by thanking you for making it even this far. I’m fully aware that “Input Validation Using Filter Functions” isn’t exactly the sexiest article title in the world! Filter functions in PHP might not be sexy, but they can improve the stability, security, and even maintainability of your code if you learn how to use them correctly. In this article I’ll explain why input validation is important, why using PHPs built-in functions for performing input validation is important, and then throw together some examples (namely using filter_input() and filter_var()), discuss some potential pitfalls, and finish with a nice, juicy call to action. Sound good? Let’s go!

Why Input Validation is Important

Input validation is one of the most important things you can do to ensure code security because input is often times the one thing about your application you cannot directly control. Because you cannot control it, you cannot trust it. Unfortunately, as programmers we often write things thinking only of how we want them to work. We don’t consider how someone else might want to make them work – either out of curiosity, ignorance, or malice. I am not going to go into too much detail about the trouble you can get into if you do not validate user input; there’s a really good article on this very site called PHP Security: Cross-Site Scripting Attacks if you want to read up on it. But I will say that validating your input is the first step to ensuring that the code you have written will be executed as intended. Maybe you are coming to PHP from another language and you might be thinking, “this was never an issue before so why should I care?” The reason validation is an issue is because PHP is loosely typed. This makes PHP great for some things, but it can make things like data validation a little bit trickier because you can pretty much pass anything to anything.

Why Using Built-in Methods is Important

In order to try and make validation a little bit easier, from PHP 5.2.0 onward we can now use the filter_input() and filter_var() functions. I’ll talk about them in more detail soon, but first I want to talk about why we should be using PHP provided functionality instead of relying our own methods or third-party tools. When you roll your own validation methods, you generally fall into the same trap that you can fall into when designing other functionality: you think about the edge cases you want to think about, not necessarily all of the different vectors that could be used to disguise certain input. Another issue is, if you are anything like me, the first 10 minutes of any code review dealing with hand-rolled validation code is spent tutting because the programmer didn’t do exactly what you would have done. This can lead to programmers spending more time learning the codebase and reading internal documentation that could instead be spent coding. Some people don’t roll their own, but instead opt for a third-party solution. There are some good ones out there, and in the past I have used OWASP ESAPI for some extra validation. These are better than perhaps the hand-rolled solutions because more eyes have looked over them, but then you have the issue of introducing third-party code into your project. Again, this increases time spent learning a codebase and reading additional documentation instead of coding. For these reasons, using native functions are better; moreover, because such functions are baked into the language, it means we have one place to go for all PHP documentation. New developers will have a greater chance of knowing what the code is and how best to use it. It will be easier to support as a result of this. Hopefully by now I have you convinced that validation is important, and that it would be a good idea to use PHP functions to help you achieve your validation needs. If you are not convinced, leave a comment and let’s discuss it.

Some Examples

The filter_input() function was introduced in PHP 5.2.0 and allows you to get an external variable by name and filter it. This is incredibly useful when dealing with $_GET and $_POST data. Let’s take as an example a simple page that reads a value passed in from the URL and handles it. We know this value should be an integer between 15 and 20. One way of doing would be something like:
<?php
if (isset($_GET["value"])) {
    $value = $_GET["value"];
}
else {
    $value = false;
}
if (is_numeric($value) && ($value >= 15 && $value <= 20)) {
    // run my code
}
else {
    // handle the issue
}
This is a really basic example and already we are writing more lines that I would like to see. First, because we can’t be sure $_GET is set, the code performs an appropriate check so that the script doesn’t fall over. Next is the fact that $value is now a “dirty” variable because it has been directly assigned from a $_GET value. We would need to take care not to use $value
anywhere else in the code in case we break anything. Then there is the issue that 16.0 is valid because is_numeric() okays it. And finally, we have an issue with the fact that the if statement is a bit of a mouthful to take in and is an extra bit of logic to work through when you are tracing through the code. Compare the above example now to this:
<?php
$value = filter_input(INPUT_GET, "value", FILTER_VALIDATE_INT,
    array("options" => array("min_range" => 15, "max_range" => 20)));
if ($value) {
    // run my code
}
else {
    // handle the issue
}
Doesn’t that make you feel warm and fuzzy? filter_input() handles the $_GET value not being set, so you don’t have to stress over whether the script is receiving the correct information or not. You also don’t have to worry about $value being dirty because it has been validated before it has been assigned. Note now that 16.0 is no longer valid. And finally, our logic is no longer complicated. It’s just a quick check for a truthy value (filter_input() will return false if the validation fails and null if $_GET["value"] wasn’t set). Obviously in a real world setting you could extract the array out into a variable stored in a configuration file somewhere so things can get changed without even needing to go into business logic. Gorgeous! Now you might be thinking that this might be useful for simple scripts that grab a couple of $_GET or $_POST variables, but what about for use inside of functions or classes? Luckily we have filter_var() for that. The filter_var() function was introduced at the same time as filter_input() and does much the same thing.
<?php
// This is a sample function, do not use this to actually email,
// that would be silly.
function emailUser($email) {
    mail($email, "Here is my email", "Some Content");
}
The danger here is that is there nothing to stop the mail()
function from attempting to send an email to literally any value that could be stored in $email. This could lead to emails not getting sent, or something getting in that can potentially use the function for malicious intent in a worst case scenario. I have seen people do a check on the result of mail(), which is fine to see if the function completed successfully, but by the time a value is returned the damage is done. Something like this is much more sane:
<?php
// This is a sample function, do not use this to actually email,
// that would be silly.
function emailUser($email) {
    $email = filter_var($email, FILTER_VALIDATE_EMAIL);
    if ($email !== false) {
        mail($email, "Here is my email", "Some Content");
    }
    else {
        // handle the issue invalid email address
    }
}
The problem with a lot of examples, the above included, is that they are basic. You might be thinking that filter_var() or filter_input() can’t be used for anything other than basic checking. The fine folks who introduced these functions considered that and allow you to pass in a filter to these functions called FILTER_CALLBACK. FILTER_CALLBACK allows you to pass in a function you have created that will accept as the input the variable being filtered – this is where you can start to have a lot of fun because you can start applying your own business logic to your filtering.

Some Potential Pitfalls

These functions are pretty great, and they allow you to do some really powerful filtering, which as we have discussed can help improve the security and reliability of your code. There are some potential drawbacks however and I would feel that I was remiss if I didn’t point them out. The main pitfall is that the functions are only as good as the filter you apply to it. Take the last example using email validation – how FILTER_VALIDATE_EMAIL handles email addresses has changed between 5.2.14 and 5.3.3, and even assuming all your applications run on the same version of PHP there are email addresses that are technically valid that you might not expect. Be sure you know about the filters you are using. The second pitfall is that people think that if they put in some filters then their code is secure. Filtering your variables goes some way to helping, but it doesn’t make your code 100% safe from abuse. I would love to talk more about this, but that is out of the scope of this article and my word count is already pretty high!

Conclusion

Hopefully you have found this introduction to input validation in PHP useful. And now, time for a call to action! I want you to take one function in your code, just one, and see what happens to it when you pass in different data types and different values. Then I want you to apply some of the filtering methods discussed here and see if there is a difference in how your code performs. I would love to know how you got on in the comments. Image via Chance Agrella / Freerangestock.com

Frequently Asked Questions about Input Validation Using Filter Functions

What are the benefits of using filter functions for input validation?

Filter functions provide a robust and secure way to validate and sanitize data input by users. They help prevent security vulnerabilities such as SQL injection and cross-site scripting (XSS) attacks, which can compromise your application and data. By using filter functions, you can ensure that the data you’re working with meets specific criteria and is safe to use in your application.

How do filter functions work in PHP?

In PHP, filter functions are used to validate and sanitize data. The filter_var() function is commonly used, which takes two parameters – the data to be filtered and the type of filter to apply. There are many predefined filters in PHP for validating email addresses, URLs, integers, and more. You can also sanitize data to remove any illegal characters.

Can I create custom filter functions?

Yes, you can create custom filter functions if the predefined filters do not meet your requirements. You can use the filter_var() function with the FILTER_CALLBACK flag and specify a callback function that will be used to filter the data.

What is the difference between validation and sanitization?

Validation is the process of checking if the data meets certain criteria, such as being a valid email address or URL. Sanitization, on the other hand, is the process of cleaning or scrubbing the data to remove any illegal or unwanted characters. Both are important for ensuring the security and integrity of your data.

How can I use filter functions to prevent SQL injection attacks?

SQL injection attacks occur when an attacker is able to insert malicious SQL code into a query. By using filter functions, you can sanitize the user input to remove any potentially harmful characters or strings. This ensures that the input can’t be used to alter the SQL query in a harmful way.

Are filter functions available in other programming languages?

Yes, similar functionality is available in many other programming languages, although the implementation may vary. For example, in JavaScript, you can use the built-in methods for string and array objects to validate and sanitize data.

What are some common mistakes to avoid when using filter functions?

One common mistake is not using the right filter for the data type. For example, using a filter designed for integers on a string could lead to unexpected results. Another mistake is not properly sanitizing data before using it in a SQL query, which can leave your application vulnerable to SQL injection attacks.

Can filter functions be used to validate and sanitize data from other sources, not just user input?

Yes, filter functions can be used to validate and sanitize data from any source, not just user input. This includes data from databases, files, APIs, and more.

How can I test the effectiveness of my filter functions?

You can test the effectiveness of your filter functions by using different types of input data, including valid and invalid data, and checking the output. You should also test with potentially malicious data to ensure that your functions are effectively preventing security vulnerabilities.

Are there any limitations or drawbacks to using filter functions?

While filter functions are a powerful tool for validating and sanitizing data, they are not a silver bullet. They should be used as part of a comprehensive security strategy, not as a standalone solution. Additionally, they can sometimes be overly strict, rejecting valid data if it doesn’t meet the exact criteria specified by the filter.

Toby OsbournToby Osbourn
View Author

Toby Osbourn is a web developer specializing in fast and secure PHP driven websites who loves to dabble in the front end when he gets chance. You can catch up with him on Twitter and his personal blog.

Intermediate
Share this article
Read Next
Get the freshest news and resources for developers, designers and digital creators in your inbox each week