Input Validation Using Filter Functions

I’d like to start off this article by thanking you for making it even this far. I’m fully aware that “Input Validation Using Filter Functions” isn’t exactly the sexiest article title in the world!

Filter functions in PHP might not be sexy, but they can improve the stability, security, and even maintainability of your code if you learn how to use them correctly.

In this article I’ll explain why input validation is important, why using PHPs built-in functions for performing input validation is important, and then throw together some examples (namely using filter_input() and filter_var()), discuss some potential pitfalls, and finish with a nice, juicy call to action. Sound good? Let’s go!

Why Input Validation is Important

Input validation is one of the most important things you can do to ensure code security because input is often times the one thing about your application you cannot directly control. Because you cannot control it, you cannot trust it.

Unfortunately, as programmers we often write things thinking only of how we want them to work. We don’t consider how someone else might want to make them work – either out of curiosity, ignorance, or malice.

I am not going to go into too much detail about the trouble you can get into if you do not validate user input; there’s a really good article on this very site called PHP Security: Cross-Site Scripting Attacks if you want to read up on it. But I will say that validating your input is the first step to ensuring that the code you have written will be executed as intended.

Maybe you are coming to PHP from another language and you might be thinking, “this was never an issue before so why should I care?” The reason validation is an issue is because PHP is loosely typed. This makes PHP great for some things, but it can make things like data validation a little bit trickier because you can pretty much pass anything to anything.

Why Using Built-in Methods is Important

In order to try and make validation a little bit easier, from PHP 5.2.0 onward we can now use the filter_input() and filter_var() functions. I’ll talk about them in more detail soon, but first I want to talk about why we should be using PHP provided functionality instead of relying our own methods or third-party tools.

When you roll your own validation methods, you generally fall into the same trap that you can fall into when designing other functionality: you think about the edge cases you want to think about, not necessarily all of the different vectors that could be used to disguise certain input. Another issue is, if you are anything like me, the first 10 minutes of any code review dealing with hand-rolled validation code is spent tutting because the programmer didn’t do exactly what you would have done. This can lead to programmers spending more time learning the codebase and reading internal documentation that could instead be spent coding.

Some people don’t roll their own, but instead opt for a third-party solution. There are some good ones out there, and in the past I have used OWASP ESAPI for some extra validation. These are better than perhaps the hand-rolled solutions because more eyes have looked over them, but then you have the issue of introducing third-party code into your project. Again, this increases time spent learning a codebase and reading additional documentation instead of coding.

For these reasons, using native functions are better; moreover, because such functions are baked into the language, it means we have one place to go for all PHP documentation. New developers will have a greater chance of knowing what the code is and how best to use it. It will be easier to support as a result of this.

Hopefully by now I have you convinced that validation is important, and that it would be a good idea to use PHP functions to help you achieve your validation needs. If you are not convinced, leave a comment and let’s discuss it.

Some Examples

The filter_input() function was introduced in PHP 5.2.0 and allows you to get an external variable by name and filter it. This is incredibly useful when dealing with $_GET and $_POST data.

Let’s take as an example a simple page that reads a value passed in from the URL and handles it. We know this value should be an integer between 15 and 20.
One way of doing would be something like:

<?php
if (isset($_GET["value"])) {
    $value = $_GET["value"];
}
else {
    $value = false;
}
if (is_numeric($value) && ($value >= 15 && $value <= 20)) {
    // run my code
}
else {
    // handle the issue
}

This is a really basic example and already we are writing more lines that I would like to see.

First, because we can’t be sure $_GET is set, the code performs an appropriate check so that the script doesn’t fall over.

Next is the fact that $value is now a “dirty” variable because it has been directly assigned from a $_GET value. We would need to take care not to use $value anywhere else in the code in case we break anything.

Then there is the issue that 16.0 is valid because is_numeric() okays it.

And finally, we have an issue with the fact that the if statement is a bit of a mouthful to take in and is an extra bit of logic to work through when you are tracing through the code.

Compare the above example now to this:

<?php
$value = filter_input(INPUT_GET, "value", FILTER_VALIDATE_INT,
    array("options" => array("min_range" => 15, "max_range" => 20)));
if ($value) {
    // run my code
}
else {
    // handle the issue
}

Doesn’t that make you feel warm and fuzzy?

filter_input() handles the $_GET value not being set, so you don’t have to stress over whether the script is receiving the correct information or not.

You also don’t have to worry about $value being dirty because it has been validated before it has been assigned.

Note now that 16.0 is no longer valid.

And finally, our logic is no longer complicated. It’s just a quick check for a truthy value (filter_input() will return false if the validation fails and null if $_GET["value"] wasn’t set).

Obviously in a real world setting you could extract the array out into a variable stored in a configuration file somewhere so things can get changed without even needing to go into business logic. Gorgeous!

Now you might be thinking that this might be useful for simple scripts that grab a couple of $_GET or $_POST variables, but what about for use inside of functions or classes? Luckily we have filter_var() for that.

The filter_var() function was introduced at the same time as filter_input() and does much the same thing.

<?php
// This is a sample function, do not use this to actually email,
// that would be silly.
function emailUser($email) {
    mail($email, "Here is my email", "Some Content");
}

The danger here is that is there nothing to stop the mail() function from attempting to send an email to literally any value that could be stored in $email. This could lead to emails not getting sent, or something getting in that can potentially use the function for malicious intent in a worst case scenario.

I have seen people do a check on the result of mail(), which is fine to see if the function completed successfully, but by the time a value is returned the damage is done.

Something like this is much more sane:

<?php
// This is a sample function, do not use this to actually email,
// that would be silly.
function emailUser($email) {
    $email = filter_var($email, FILTER_VALIDATE_EMAIL);
    if ($email !== false) {
        mail($email, "Here is my email", "Some Content");
    }
    else {
        // handle the issue invalid email address
    }
}

The problem with a lot of examples, the above included, is that they are basic. You might be thinking that filter_var() or filter_input() can’t be used for anything other than basic checking. The fine folks who introduced these functions considered that and allow you to pass in a filter to these functions called FILTER_CALLBACK.

FILTER_CALLBACK allows you to pass in a function you have created that will accept as the input the variable being filtered – this is where you can start to have a lot of fun because you can start applying your own business logic to your filtering.

Some Potential Pitfalls

These functions are pretty great, and they allow you to do some really powerful filtering, which as we have discussed can help improve the security and reliability of your code. There are some potential drawbacks however and I would feel that I was remiss if I didn’t point them out.

The main pitfall is that the functions are only as good as the filter you apply to it. Take the last example using email validation – how FILTER_VALIDATE_EMAIL handles email addresses has changed between 5.2.14 and 5.3.3, and even assuming all your applications run on the same version of PHP there are email addresses that are technically valid that you might not expect. Be sure you know about the filters you are using.

The second pitfall is that people think that if they put in some filters then their code is secure. Filtering your variables goes some way to helping, but it doesn’t make your code 100% safe from abuse. I would love to talk more about this, but that is out of the scope of this article and my word count is already pretty high!

Conclusion

Hopefully you have found this introduction to input validation in PHP useful. And now, time for a call to action!

I want you to take one function in your code, just one, and see what happens to it when you pass in different data types and different values. Then I want you to apply some of the filtering methods discussed here and see if there is a difference in how your code performs. I would love to know how you got on in the comments.

Image via Chance Agrella / Freerangestock.com

Win an Annual Membership to Learnable,

SitePoint's Learning Platform

  • http://mattbragg.com Matt Bragg

    Excellent article Tony. I had begun to use input filtering functions on a recent project. They worked just fine on my testing server and saved a lot of time. I did run into a problem when I began to test the project on my hosted server space. I received a PHP error related to the filter_input()on all my form inputs. After changing back to the good old $nameFirst = $_POST["nameFirst"]; method… all worked as it should.

    I found this conflict odd as filter_input() is reportedly supported in PHP 5.2.0 and my host is using PHP 5.2.17 yet that install did not like those new functions.

    Hopefully my host (LunarPages) will iron out their issues soon and I can return to using these time saving tools again.

    Good advice here. Thanks

    • mario

      @Matt: If absent (in PHP int["field"] or $_GET->text["content"] or even $_REQUEST->ascii->sql["data"] is as simple as it gets. Remember: security follows simplicity.

    • mario

      @Matt: If absent (in PHP <5.2 or if not compiled in) you can use the filter_* emulations from “upgradephp”.

      Personally I eschew the cumbersome filter_var interface. I’m using an implicit and lean API directly on the input globals. $_POST->int["field"] or $_GET->text["content"] or even $_REQUEST->ascii->html->sql["data"] is as simple as it gets. Remember: security follows simplicity.

      Case in point: this comment form lacks html escaping.

      • Michael

        Mario, is that lean API you speak of available somewhere? While I currently use some basic filter_inputs, beyond the basics is generally too cumbersome. Something faster to code and more explicit would be ideal in large projects.

        • mario

          It’s available from http://sourceforge.net/p/php7framework/wiki/input/ Albeit might be buggy still. And possibly too much overhead for some projects. But puts my mind at ease anyway.

          • http://tosbourn.com Toby Osbourn

            I agree that the syntax is more cumbersome than what you are describing – PHP has never been the prettiest language to work in, but I think your last comment enforces my choice of using in built features of PHP for this – I am not willing to add a buggy codebase into a project for someone, why would I include code that I cannot stand behind?

    • http://tosbourn.com Toby Osbourn

      Hey Matt, cheers for the comment. I would be interested to know what the error was, even though filter_input() has been in since 5.2 I believe some of the filters were added later, so this might have been the issue.

      • http://mattbragg.com Matt Bragg

        The error I received was as follows:
        Fatal error: Call to undefined function: filter_input() in /home/addWorkersRecords.php on line 21

        I couldn’t remember the exact message so I had to recreate the condition triggering the error.
        Thanks for the reply.

        • http://tosbourn.com Toby Osbourn

          Hi again Matt,
          Thanks for getting back to me, that isn’t something I have come across before – I will take a look into why that might be next time I get a second and I will get back to you.

    • Pete

      Hey great article Toby. I had no idea PHP had this filtering built in: the wonders never cease with this language, they’ve seemingly picked the most useful timesavers and coded them right in natively!

      • http://tosbourn.com Toby Osbourn

        Haha, PHP isn’t perfect but it is perhaps more useful than a subset of developers would have you believe! Thanks for the kind words.

  • http://informationthreshold.blogspot.com Steve

    You may want to point out and briefly explain the use of !== on line 6 of the filter_var example, as it is not a comparison operator that you see that often (at least I don’t) and some people may think of it as a typo of “!=”.

    • http://tosbourn.com Toby Osbourn

      Thanks for the comment Steve – I don’t want to spend much time in the article talking about things that aren’t directly related to the filter functions but essentially !== is the inverse of ===, which is == but takes into account the object (so 1 == ‘1’ but 1 !== ‘1’).

      Hope that helps!

  • Arturo Hernandez

    Toby, I think you now owe your readers an article on exception handling. I am sure that an email function with an empty or an invalid email address should throw an exception. That makes the code much more easier to read.

    • http://tosbourn.com Toby Osbourn

      You are completely right, this is why I commented the email function the way I did. I figured if people didn’t use filter functions or exception handling, putting it all into one code example would be very confusing. Thanks for the feedback.

  • http://arts.gov arrienc

    I’m a basic PHP user. Can you say what the relationship of input validation is to Prepared Statements. Are they used instead of, in conjunction with…?

    • http://tosbourn.com Toby Osbourn

      Hi Arrienc, cheers for commenting.

      They would be used in conjunction with, prepared statements are really good at protecting your output into things, but a lot of the time you will be dealing with input from unknown sources and not necessarily putting it anywhere.

  • http://www.alabiansolutions.com Alabi

    Thank for this article. I never knew about these validation functions before. What is the advantage of using !== over != for the comparison done in “if ($email !== false)”.

    • http://tosbourn.com Toby Osbourn

      Hey Alabi, thanks for your comment.

      It is habit more than anything, were possible you should enforce type checking by using !== or ===

  • Viktor

    For knowledge of PHP +10
    For the article -100, well, very bad description.

    • http://tosbourn.com Toby Osbourn

      Hey Viktor,

      Thanks for your comment, I am sorry you didn’t like the description. Could you let me know what you thought was missing or what I could have done better?

  • Viktor

    Hi Toby Osbourn,
    Combination of words is scattered. The meaning of proposals is lost. That would understand what is written, you need to put the words together like a mosaic.

    • http://tosbourn.com Toby Osbourn

      Hi Viktor,

      Thanks for getting back to me and your comments – sorry you didn’t like my writing style. I will spend more time thinking about the flow of any future articles.

  • Alan Rew

    This is a very useful and easy-to-digest article on a PHP feature about which I knew nothing. Really useful. While learning PHP, it’s sometimes hard to know what’s important to learn & what isn’t, so you’ve saved me a lot of time. More articles please!

    • http://tosbourn.com Toby Osbourn

      Thanks for your kind words Alan, I have another article in the pipeline.

  • freedimension

    Hi Toby
    Inspiring article! Thank you very much.
    Though in the “Why using …”-Section I instantly missed what I think is the main reason of why to use built-ins: Future bugs and loopholes are fixed by the PHP team. You don’t have to bother after delivery. No need to repair your custom code or upload the new version of a third party library or framework. All it needs is an attentive administrator updating the server’s packages.
    Sure, security is a process, not a state. But that doesn’t mean you have to be part of the process all the time. :)

  • http://kmbweb.de KMB

    Damn, I learned something today. Thanks, Toby!

  • http://www.acewebguy.com Php expert

    Please upload a detailed post on SQL injections

  • Alex

    Good stuff, man i have really learn something

  • Jon

    I disagree with Viktor, the article is well written, and the examples are easy to follow.

    The only thing I didn’t like was the first example doesn’t make a strong case for using these functions. It’s an excellent example of showing how filter_input() works, but it is not an excellent example for showing filter_var() should be used instead of filter_var(). At least not with the reasons given in the article. One of the benefits cited for using filter_var() was less coding required. But once you strip out all the whitespace and comments, the second solution was a little longer! I have an issue with is_numeric() being used in the first case and noting it returns true for a float, when you could have used is_int() for an integer. Also, can you really say it improves readability? A new function has to be learned, parameter order, predefined constants. If I already know if-else, this isn’t motivating me to learn new functions. The second example could have been the example to show how filter_input() and filter_var() are better. Passing in FILTER_VALIDATE_EMAIL is so much easier than using a regular expression and so much shorter and easier to read too.

  • Jon

    In my previous comment I mean to say “It’s an excellent example of showing how filter_var() works, but it is not an excellent example for showing how using filter_var() or filter_var() is a better solution than using an if-else solution.”