Validate Your Input!

The previous blog post about MySQL and PostgreSQL ignited a discussion about validation of input in a database-driven web application.

In this entry, I’ll attempt to explain what validation is and why it is important.

I remember being taught the importance of input validation way back in high school, in an IT elective subject. We were taught that all user input must be validated prior to acting upon it. At that stage we were ‘programming’ in Microsoft Excel using VBscript, but validation is all the more relevant for web applications, which may be used – or exploited – by a wide range of people.

Data validation is the process of checking user input to see whether it is in the expected format and is within the range of allowed values for that input. It has a number of purposes, some of which are below.

If a user submits data that is not within the allowed values or it is in the wrong format, it may cause the application to exhibit unexpected behaviour – which may include a blank screen or a screen that doesn’t make sense. Validation allows for this to be prevented, and instead to present a human-readable error message back to the user. Allowing the user to see why the input wasn’t accepted greatly aids in usability of the application.
A malicious user of the application may attempt to exploit problems in the application by sending data that is not in the format that the application expects or outside the range of values that a user should be using. The value used, if not checked, may grant the user access to some aspect of the application otherwise hidden, due to an internal problem in the application.
Data validation can help to ensure that data stored is complete and that nothing is missing. For instance, ensuring that ‘required’ fields are indeed filled out by the user ensures that there won’t be gaps (or empty strings) in a database record, which may cause problems with the incomplete data is acted upon later, for instance to follow up with a customer.

So, validation has many different purposes.

Here are some examples of the validation that you might want to apply to input from your users.

Strings:

Check that the string is a valid length.
Check that the string contains characters within a certain range.
If the string is ‘required’, check that it isn’t blank.

Numeric values:

Check that the value is in fact numeric, and isn’t an arbitrary string.
If the value must be a whole number (integer), make sure it is.
Check that the value sits between the highest allowed value and the lowest allowed value.
If there are some values which are not allowed (such as zero), check that it is not one of those values.

Boolean (true/false) values:

Make sure the value is false or true (or 0 or 1).

Enumerated (select) values:

Check that the value is exactly equal to one of the allowed choices.

There are some circumstances when simply checking to see if a string is the right length or contains allowed characters is not enough, and for this, regular expressions can be used to ensure that the string adheres to a given format or pattern.

In my personal projects, I use a helper class to simply the task of validating input. The load_var method of this class loads a value from an input variable, and checks it against a given pattern. If the input is not valid, the method returns null.

$username = $context->load_var('username', 'POST', 'name'); if ($username === null) { // report an error }

Sometimes, I use a PHP library for generating and processing web forms, such as a library we use at SitePoint (affectionately called RapidAdmin), or PEAR’s HTML_Quickform, or a library of my own. Such a library can usually validate your data for you. For instance, HTML_Quickform is capable of validating the data according to the patterns you specify when you create the form. If any form field input is not filled out correctly, HTML_Quickform will allow you to show the user an error message rather than passing the invalid field values into your application.

As mentioned in the last post’s comments, database servers such as MySQL and PostgreSQL also perform data validation to ensure that the values being entered will not cause problems when inserting them into database tables. This data validation is for the sake of the database, and not your (PHP) application, so it is not a replacement for your own validation. Also, as Simon pointed out, MySQL sometimes silently truncates or transforms data without validating it, which could lead to unexpected behaviour without an error message explaining it.

Therefore, you should always validate data before passing it to your database server. This gives your application the chance to treat the problem gracefully, returning a human-readable, friendly error message to the user. An end user doesn’t want to be confronted by a database query error, as the user didn’t even write the query.