Zend_Filter Reviewed, Blacklist / Whitelist Filters

I like Zend Framework‘s Zend_Filter class. It’s basically a set of methods for validating untrusted data. Although the two arguably most important features isEmail() and isUri() (the latter can be worked around with Zend_Uri) are still missing, the whole thing looks promising already. Here’s a few thoughts on the package:

  • Remove isGreaterThan() and isLessThan(). That’s what we have “< " and ">” operators for. I can understand the designer’s intention to deliver a complete set of tests but these just bloat both Zend_Filter’s and the user’s code. There is no isEqualTo(), either.
  • isDate() looks like a stub. This should be replaced by something more sophisticated.
  • Clean up the code of isHostname().
  • The method name isRegex() makes me think that it checks whether the argument is a valid regular expression. Since pattern matching is a special way of filtering anyway, I’d just abandon the “is” prefix and call it match().
  • I don’t know if isName() works completely accurate on any exotic names. Besides, it can be easily left away as it’s a job for whitelist filtering. See below.
  • International support for isPhone(). I can deliver a Swiss implementation for it, just let me know. By the way, apply self::getDigits() on on the input instead of ctype_digit checking.
  • Let’s add three more class methods to Zend_Filter. The first one escapes a string for safe use in regular expressions:

public static function getRegexEscaped($input) {
  $output = '';
  for($i = 0; $i < strlen($input); $i++) {
    $output .= 'x'.bin2hex($input{$i});
  }
  return $output;
}

  • The next one validates a string by a character whitelist:

public static function getWhitelisted($input, $allowed_chars = '', $allow_alpha = true, $allow_numeric = true) {
  $regex = '%[^'.($allow_alpha ? '[:alpha:]' : '').($allow_numeric ? 'd' : '').self::getRegexEscaped($allowed_chars).']%';
  return preg_replace($regex, '', $input);
}

  • When there’s whitelisting, there should be blacklisting, too. On second thought, this should be implemented with str_replace() though.

public function getBlacklisted($input, $forbidden_chars) {
  $regex = '%['.self::getRegexEscaped($forbidden_chars).']%';
  return preg_replace($regex, '', $input);
}

For example, we can use the more flexible whitelisting method instead of Zend_Filter::isName.


/* We only allow letters, spaces and dashes in names */
$name = Zend_Filter::getWhitelisted($name, " -", true, false);

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • http://www.deanclatworthy.com Dean C

    I’m going to have to get stuck into this Zend Framework over the summer. Thanks for the post Maarten :)

  • http://www.sudokumadness.com/ coffee_ninja

    Glad to see someone else diving headfirst into the Zend Framework. I’ve played with filters, routers, and various controllers in the framework and like what I’ve seen. Has anyone done anything interesting with the DB layer or the various web service libraries?

  • http://shiflett.org/ shiflett

    Hi Maarten,

    I provided a very brief response here:

    http://shiflett.org/archive/220

    The short answer is that I agree with almost everything you mention, but I want to take the time to elaborate a bit more. You caught me on a busy night. :-)

  • Christopher Thompson

    I really think the Zend_Filter class should be completely changed. The InputFilter class provides a handy shortcut way to do simple filtering, but for more advanced controllers you really need FilterChain/Validator style classes. To achieve that, the Zend_Filter class should be split into two groups: Filters and Rules. Filters modify the value passed to them; Rules return true/false. To allow extensiblity each Filter and Rule should be its own class. This allows FilterChain and Validator classes to accept the polymorphic Filters and Rules (respectively).

    The InputFilter class could keep the same interface and would only need minor changes internally to use the Filters and Rules classes. But this change to Zend_Filter would open up many more filtering and validation possiblities.

  • http://www.phpism.net Maarten Manders

    Or.. let’s use fluent interfaces ;)

    $filter->theShit()->outOf()->untrusted($garbage)

    Nah, seriously: Christopher, you do have a point but your suggestion looks way overdesigned from a PHP perspective. I can’t say that I love the ongoing trend to static methods but it’s really simple this way and that’s what the Zend Framwork is all about.

  • Arnaud

    coffee_ninja: the db part looks like the weakest point in the MVC part right now. I’ve started to use the ezcDb layer instead but haven’t got far enough to see what effect it has.

  • http://www.sudokumadness.com/ coffee_ninja

    Arnaud, from the bit I looked at the DB portion of the framework, I’d have to agree. I think ZFW needs it to “be complete,” but I don’t neccessary think I’m going to get much use out of it.

    I’ve been writing my own DB layers for my applications, writing an interface that specifies what methods I need, then implementing it for a specific DBMS. I usually write Factory class to retrieve the right implementation. Personally I like doing things this way, as I can take advantage of features specific to any system.

    Correct me if I’m wrong, but if I choose to use Zend_DB, I’ll trading in a lot of power for simplicity of coding, right?

  • Arnaud

    coffee_ninja: i understand the idea of the framework is to give you power through simplicity so I would say you should get both :)

  • Christopher Thompson

    Maarten: Actually it’s not overdesigned — it’s simpified, modular, and pretty standard. And as I said, I see no problem with procedural style solutions like Zend_InputFilter and would make no changes to it.

    But the classes underneath Zend_InputFilter should allow multiple for solutions that are OO or procedural, and are simple or complex. The current Zend_Filter does not. They apparently just split out code from InputFilter not thinking about other use cases.

  • yosoyminero

    I don’t really know if it is already considered, but UTF-8/Unicode should also be in the scope…

  • Pingback: Paul M. Jones » Blog Archive » Solar 0.15.0 alpha released

  • Pingback: amperspective » Link Filter

  • Anonymous

    screw Zend’s framework, Im using PHPonTrax