Validate Your Input!

The previous blog post about MySQL and PostgreSQL ignited a discussion about validation of input in a database-driven web application.

In this entry, I’ll attempt to explain what validation is and why it is important.

I remember being taught the importance of input validation way back in high school, in an IT elective subject. We were taught that all user input must be validated prior to acting upon it. At that stage we were ‘programming’ in Microsoft Excel using VBscript, but validation is all the more relevant for web applications, which may be used – or exploited – by a wide range of people.

Data validation is the process of checking user input to see whether it is in the expected format and is within the range of allowed values for that input. It has a number of purposes, some of which are below.

  • If a user submits data that is not within the allowed values or it is in the wrong format, it may cause the application to exhibit unexpected behaviour – which may include a blank screen or a screen that doesn’t make sense. Validation allows for this to be prevented, and instead to present a human-readable error message back to the user. Allowing the user to see why the input wasn’t accepted greatly aids in usability of the application.
  • A malicious user of the application may attempt to exploit problems in the application by sending data that is not in the format that the application expects or outside the range of values that a user should be using. The value used, if not checked, may grant the user access to some aspect of the application otherwise hidden, due to an internal problem in the application.
  • Data validation can help to ensure that data stored is complete and that nothing is missing. For instance, ensuring that ‘required’ fields are indeed filled out by the user ensures that there won’t be gaps (or empty strings) in a database record, which may cause problems with the incomplete data is acted upon later, for instance to follow up with a customer.

So, validation has many different purposes.

Here are some examples of the validation that you might want to apply to input from your users.

Strings:

  • Check that the string is a valid length.
  • Check that the string contains characters within a certain range.
  • If the string is ‘required’, check that it isn’t blank.

Numeric values:

  • Check that the value is in fact numeric, and isn’t an arbitrary string.
  • If the value must be a whole number (integer), make sure it is.
  • Check that the value sits between the highest allowed value and the lowest allowed value.
  • If there are some values which are not allowed (such as zero), check that it is not one of those values.

Boolean (true/false) values:

  • Make sure the value is false or true (or 0 or 1).

Enumerated (select) values:

  • Check that the value is exactly equal to one of the allowed choices.

There are some circumstances when simply checking to see if a string is the right length or contains allowed characters is not enough, and for this, regular expressions can be used to ensure that the string adheres to a given format or pattern.

In my personal projects, I use a helper class to simply the task of validating input. The load_var method of this class loads a value from an input variable, and checks it against a given pattern. If the input is not valid, the method returns null.


$username = $context->load_var('username', 'POST', 'name');
if ($username === null)
{
// report an error
}

Sometimes, I use a PHP library for generating and processing web forms, such as a library we use at SitePoint (affectionately called RapidAdmin), or PEAR’s HTML_Quickform, or a library of my own. Such a library can usually validate your data for you. For instance, HTML_Quickform is capable of validating the data according to the patterns you specify when you create the form. If any form field input is not filled out correctly, HTML_Quickform will allow you to show the user an error message rather than passing the invalid field values into your application.

As mentioned in the last post’s comments, database servers such as MySQL and PostgreSQL also perform data validation to ensure that the values being entered will not cause problems when inserting them into database tables. This data validation is for the sake of the database, and not your (PHP) application, so it is not a replacement for your own validation. Also, as Simon pointed out, MySQL sometimes silently truncates or transforms data without validating it, which could lead to unexpected behaviour without an error message explaining it.

Therefore, you should always validate data before passing it to your database server. This gives your application the chance to treat the problem gracefully, returning a human-readable, friendly error message to the user. An end user doesn’t want to be confronted by a database query error, as the user didn’t even write the query.

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • MiiJaySung

    Oh, BTW, the C style casts can be also be extended to other data types, i.e….

    (bool), (float) etc…

    The (array) cast can be useful for non DB stuff where you might have a method that either accepts an array of strings, or a single string…

    function do_something_on_string_or_array($data) {
    $data = (array)$data;
    }

    This makes $data an array if you passed a string (or empty array of NULL is passed), or leaves $data alone if it’s an array. Ideal when you wanna foreach over something. Be warned it doesn’t work on objects though

  • http://www.primacognos.com bigduke

    Josh, could you throw some light on MySQL would achieve any such feat when its not even capable of all those features ?
    Its an interesting concept though, let the db do the db work while the script just acts as a transport layer.

  • Andrew Phillipo

    Also be sure to check for cross site scripting attacks: For exampe if you ever redisplay the input a user has entered into your site this could be used by someevilsite.com to post a form containing scripting to your pages.

    Take the example of an online shop -> say evil site has a link to a ‘special’ offer on your site; this link could be a post request which sent a script to the site that changed the ‘Buy It’ button into a post back to their site, stealing all the information such as credit card details in their post!

    Not sure I’ve explained it all that well but make sure you Google it!

  • WebDevGuy

    This is a good article and a book could be written about validating. However, most people validate common rules (required, string, numeric, text length). It would be great if someone would just post some common validation functions everyone could use.

    I’ve used MTHL_QuickForm with great success. HOWEVER, it took me weeks to install and learn it – there seems to be only one main guy who posts replies to quickform questions at pear.php.com forums.

  • Lars Olesen

    Could you elaborate a bit more on the class you are using to load variables into a variable?

  • lsmith

    MDB2 has quote methods specifically for text, time, date, timestamp, boolean, float, decimal, blob, clob … you can even extend the datatype module. in my framework I have added a serialize datatype that automatically calls (un)serialize.

  • myrdhrin

    What’s being said of User Input applies to anything that is outside the control of any application (input files, input stream, etc.).

  • http://www.scanit.ca nos

    Remember when doing validation, go most restrictive, not least, and default to a safe value.

    So, if you’re checking a boolean value (say a radio button), check if its true, if not assume its false and set it to false.

    If your checking a number and its a wholenumber, do something like:

    $number = intval($_REQUEST['number'];
    You’re always better off doing something like intval than a regular expression… if you can.

    If you do have to go to regular expressions, keep it simple. Don’t look for things you won’t allow, check only for those things you will allow.

    That’s what I do on my sites, like
    jointheweb.org[/URL]. You’re almost always going to need to accept forms, so make sure you checking your data. If its going to end up in the database, make sure you check for SQL injection attacks as well (for mysql use mysql_real_escape_string() ).

  • http://www.homeorchardsociety.org SRTech

    Would you be willing to share a little more information on “RapidAdmin”?

  • Josh Berkus

    BTW, when using full-featured databases such as PostgreSQL, Oracle, MSSQL, Firebird etc., you can perform additional validation using database programming. This would involve using views, functions, triggers and other tools to do more-exact-than-data-type validation.

    For example, in my PostgreSQL-PHP web apps, the PHP code *never* does “insert into table”, nor is it allowed (per database security) to do so. Instead, it calls a function like:
    df_add_new_company( user, session, name, address, no_employees );
    This function validates input including checking user permissions and session validity as well as checking for duplicates, formatting, etc. SQL injection attempts will simply cause a database error.

    Triggers and updatable views can be used in a similar way.

    –Josh Berkus
    PostgreSQL developer

  • http://www.assemblysys.com/dataServices/index.php mniessen

    To avoid SQL injections, is it best to use mysql_real_escape_string or is addslashes enough?

  • http://www.peterakkies.com/ Kilroy

    mysql_real_escape_string() is preferred above addslashes() when using MySQL.

  • http://www.assemblysys.com/dataServices/index.php mniessen

    Why is mysql_real_escape_string() preferred? Are there any risks using addslashes() instead of mysql_real_escape_string()?

  • MiiJaySung

    I can’t see any difference with addslashes. The only real reason for mysql_real_escape_string() is that should the encoding change for escaping, you won’t have to search and replace a load of addslashes commands.

    I personally don’t use either. I perfer to use str_replace(”’, ””, $string) as this is the real ANSI way to do things, and this works on all good DB’s.

    Also quite often people need to make sure the $_REQUEST var they use is a INT type. Instead of using some huge set of methods to clean up data, if I know the value is for an INT field, I just do something like

    $sql = ‘SELECT * FROM “users” where “user_id” = ‘.(int)$_REQUEST['uid'];

    This is much easier and MUCH quicker. if uid is not an INT, when it’s cast, it will evaluate to 0. Because 0 is not used for insert ids, no resultset data is returned, which means it’s safe, fast and easy to use.

  • fghgf

  • Anita

    Dear Josh

    I am new to postgres and have a query for u, I require to have a table
    in the below format

    Primary_Key Action_Name (Varchar) Action_Attributes (Text)
    1 Forward SrcMac:someValue;DstMac:someValue;

    Now the situation is like every record that comes in this table,has n number of Action_Attribute, so i am entering this has key-value pair , I would like to use a trigger wherein i can validate the values of the known attributes like Source Macaddress , Destination Macaddress etc…At the same time in case any attributes other than these come , i will enter them as text.

    Is it possible to have such a trigger defined fot a particular field in this fashion, Can u please guide and suggest, what could be the best way to handle such a scenario….