Dynamic global functions in PHP

Tweet

Like many others, I prefer to use procedural PHP as a template language. While PHP’s syntax makes it a practical choice for this, there is a problem with embedding dynamic content. Most PHP applications produce HTML output, so you end up writing <?php echo htmlspecialchars($foo);?> a lot, using this technique. Or you forget it, and make your application prone to all sorts of nasty XSS attacks.

Apart from the annoyance of superfluous typing, there is a danger of getting lazy, seeing that <?php echo $foo;?> is remarkably shorter to type. In some situations, it won’t manifest itself as a problem either, since some content-types never contains HTML special characters (Numbers for example). This is particularly nasty, because errors in the view layer are notoriously hard to track down, and unlike SQL-injections — a similar problem — the consequences tend to hurt the users of a site, rather than the site directly.

KISS

Recently, I had a look at some code, written for CakePHP. My eye caught a function e, which is shorthand for echo. A single letter, regular function is undoubtedly the simplest way to extend PHP’s syntax. Thinking about it, it’s fairly obvious, but it just never occurred to me.
Well, the CakePHP developers made a mistake, as it should have been shorthand for echo htmlspecialchars. Nonetheless, the syntax works well. So I began using a globally defined function which looks a bit like this:


function e($string) {
  echo htmlspecialchars($string);
}

And it works too. It saves me typing and keeps me from forgetting to escape output, because I have to do more work to output strings unescaped, than not. Simple, but powerful.

The clash

There is a problem though; Since this is such a good name for a function, chances are that someone else would use it for something different, or perhaps even for the same. I already know of at least one potential nameclash, since I got the idea from CakePHP.

The usual way of dealing with this, would be with namespaces, but — alas — PHP doesn’t have namespaces (yet), and the common solution of pseudo namespaces (Eg. prefixing names) doesn’t work here, since it would defy the purpose of the function in the first place.

There’s another problem too. In the few cases, where we aren’t rendering HTML/XML output, we don’t want to escape strings for embedding in HTML/XML — instead, we want to escape it for embedding in that target language. Even in HTML, it may be necessary to escape strings with htmlentities, rather than htmlspecialchars, if the text encoding isn’t ISO-8859-1. Or encode the string to UTF-8, if the template is in UTF-8.

Making the static dynamic

The problem with all this is, that the function is static — That’s the nature of global functions in PHP. Other interpreted languages allows us to redefine functions at runtime, but no such luck with PHP (Well, strictly speaking, runkit allows it, but no-one in their right mind would use it in a production environment).
There is however a loophole; Using a callback, we can delegate to a dynamically defined handler:


if (!function_exists('e')) {
  function e($args) {
    $args = func_get_args();
    return call_user_func_array($GLOBALS['_global_function_handler_e'], $args);
  }
}

Much better — Now I can define my own output handler as:


$GLOBALS['_global_function_handler_e'] = 'my_global_function_handler_e';
function my_global_function_handler_e($string) {
  echo htmlspecialchars($string);
}

And CakePHP can use:


$GLOBALS['_global_function_handler_e'] = 'cakephp_global_function_handler_e';
function cakephp_global_function_handler_e($string) {
  echo htmlspecialchars($string);
}

We’re still using a global symbol, but at least it’s dynamic, rather than static.

So here’s a plea to framework writers in the PHP world: If we could all agree to do this for any function, defined in the global scope, which has an obvious risk of nameclashing, I think we would all be better off.

Just a humble suggestion of course.

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • php_penguin

    I use a similar function, which does stripslashes and various other nice cleaning tasks with the name “eco”, although I see that simply “e” works just as well, for less characters.

    Defining a specific set of tasks for this function would be much better for the scripting world as a whole.

    Perhaps a PHP module? It might be implemented by default in the future

  • http://www.mellowmorning.com lajkonik86

    Output escaping is just a setting on or off, when you are using Symfony.

  • Anonymous

    I think it is brilliant. I hope all frameworks will apply this. This way it would be much easier to rely on different libraries and avoiding name clashes. It is not a discussion about which framework does what, it simply applies to all frameworks which might use a function in the global scope.

  • Anonymous

    Actually, it might even be worth while for the frameworks to come to some sort of agreement which functions might be candidates for use everywhere.

    These suggestions might be on the list

    e($sentence) – return escaped output
    __($string) – return a translation (should probably also escape output)
    url($relative_url) – return the url to a page based on the relative url
    email($email) – return a spam proof email address

  • Anonymous

    I think using global functions is a bad practice anyway. Even with the described improvement it may end up with code that is hard to maintain.

  • kyberfabrikken

    I think using global functions is a bad practice anyway. Even with the described improvement it may end up with code that is hard to maintain.

    I wouldn’t recommend writing all procedural code just because of this, but PHP-style templates are inherently procedural in the first place, why a function feels natural.
    The problem with global functions is largely, that they are statically bound, thus creating a very hard coupling. By using a dynamic function, this coupling is removed. Basically, this “trick” is just syntactical sugar on top of a callback.

  • Avi

    Why can’t you just escape the strings in your model or controller before it comes to the view so that all you need is echo?

  • logic_earth

    @Avi
    For different output formats.

  • Avi

    @Logic Earth
    For different output formats.

    So I would say the the raw data would remain in the model as is, and the controller would modify as nessesary…
    For instance, if you would like your to have output in XHTML, XML, and JSON, I always have my controllers either convert the data to XML and JSON, or just send it to an XHTML template instead of creating the XML or JSON in a template itself. In otherwords, I see nothing wrong with having the controller format data as well…

  • Nate

    “Well, the CakePHP developers made a mistake, as it should have been shorthand for echo htmlspecialchars.”

    Nope, we didn’t.

    The e() function was put in place to replace a feature we deprecated called AUTO_OUTPUT, which would allow view helpers to directly output the data it returns (the purpose was essentially shorthand). So, since the helper methods themselves actually output HTML, adding htmlentities() to e() would have defeated the purpose.

    We do, however, have another shorthand function called h(). I’m sure you can guess what that one does.

  • kyberfabrikken

    @nate
    Thanks for correcting me on the finer points of cake; As mentioned, my experience with the framework is cursory.
    Historical reasons for the current design aside, there’s still a need to escape variables, when outputting them. Less so, if most output is generated in helpers, but nonetheless. In fact the function, I described in my post is a view helper, albeit a very simple one.

    However, I’m not interested in discussing where the best place to escape output is — there are pros and cons of different decisions — but for these different decisions to co-exist, we need to agree on a common infrastructure; I had a deeper look at cake’s source code after your post, and I see that there are a number of shortly named functions in there. This makes the risk of nameclashing very high. Agreeing on using a dynamic implementation as I described, seems imminent.

  • mrclay

    Even simpler, create a prefixed function and call it by name stored in a var:

    function MyPrefix_e($str) {
        echo htmlspecialchars($str);
    }
    $e = 'MyPrefix_e';
    
    // in template PHP block
    $e($escapeMe);
  • Dr Livingston

    > In otherwords, I see nothing wrong with having the controller
    > format data as well…

    But that isn’t the role that the controller has though; the responsibility that you are talking about belongs in the model first, to return the data in correct package, and the view, to present it as required.

    The view, chosen by the controller, would know how to present that given data be it XML, JSON et al.

  • michel

    If you’re capturing template output within object context, a stream wrapper to rewrite PHP could be a feasible option.
    The stream wrapper interface makes it very easy to rewrite shorthand to verbose statements. For example, becomes e(‘foo’)?>. All the advantages of encapsulation without the disadvantages of global scope.
    I may be stating the obvious of course.

  • michel

    Oops, example nearly stripped: becomes e('foo')?>

  • kyberfabrikken

    Even simpler, create a prefixed function and call it by name stored in a var…

    Yes, that’s an even simpler way of implementing it. There are two reasons why I wanted to hide the callback behind a regular function.

    The first is, that people are generally uncomfortable with callbacks. Even if it is a valid syntax, it’s bound to create confusion. As templates are otherwise fairly simple code, it’s something that inexperienced programmers often deal with, and they wouldn’t have an easy time with a callback syntax.

    The other problem is, that you can only use regular functions. If you want to create a callback to an object method, you need to use the convoluted call_user_func().

    If you’re capturing template output within object context, a stream wrapper to rewrite PHP could be a feasible option.

    stream wrappers are just a fancy way of doing file_get_contents() and eval() it. To use that, you would have to parse the content, in which case, I think you have moved away from the PHP-as-template language paradigm.

  • Avi

    The view, chosen by the controller, would know how to present that given data be it XML, JSON et al.

    If we’re gonna deal with this in an object oriented fashion (because you’re treating the view as an entity), why not encapsulate the view in an object which has a method for escaping html. This way we could avoid having global functions in the first place

  • michel

    stream wrappers are just a fancy way of doing file_get_contents() and eval() it. To use that, you would have to parse the content, in which case, I think you have moved away from the PHP-as-template language paradigm.

    I’d like to nuance that: yes, you’re moving away from ‘out-of-the-box’ PHP templating by depending on a parsing layer. You can still use PHP *syntax*, however. I think shortcuts can be very convenient in templates, especially if you have to produce a high volume of template code. To stick with the example, why type <?php echo (expression) ?> when you can support <?=(expression)?> with minimal effort?

  • micmath

    Can I suggest? escho()

  • Fritz


    inspires a static class e
    e::s($arg) // htmlspecialchars
    e::d($arg) // date format
    e::n($arg) // numeric format
    e::U($arg) // uppercase

    ...etc, you get the picture

  • ballyhoos

    I think using global functions is a bad practice anyway. Even with the described improvement it may end up with code that is hard to maintain.

    How can you say that…that’s the stupidest thing I’ve heard. I don’t see how you think that using global functions would create code that is hard to maintain? Talking from experience with creating a php framework it allows for extremely quick reference and using global functions for like url replacing within href tags and is extremely useful.

    All my global functions start with an underscore thus differentiating from php defined global functions like __autload etc… maybe that could solve the conflict?