OOP and Performance

    Harry Fuecks

    Object oriented programming, compared to procedural is typically seen as a trade off: increased “developer performance” through better modularity / re-use vs. slower processing, based on the extra runtime lookup overhead objects introduce (compared to the equivalent collection of functions + variables).

    Search Google for “php oop” and this is the first result, which is fairly well balanced but comes to the safe conclusion;

    the next time you are developing PHP code, you should consider whether you want faster execution times / less CPU load, or easier to maintain code

    The argument is based on benchmarking code. What it ignores though is the human aspect of writing code and that’s where a side-effect of OOP can result in improved performance.

    Bearing in mind I’m talking only in a general sense, I think the tendency with procedural code is to use “brute force” – the actual meaning and behaviour of the logic being masked by the spaghetti. If what the code was actually doing was transparent to the developer, they might see some of the “blinding” inefficiencies they’ve introduced.

    By way of anecdote, a while back I was asked to help with a statistical analysis tool, written in PHP, that performed calculations on a giant data set. PHP, the server and the Pope had already been blamed for its atrocious performance.

    Written procedurally, what no-one had got round to blaming was logic that basically boiled down to this (but it wasn’t so easy to see, being scattered across multiple files);

    $giantDataSet = getGiantDatasetFromSomewhere();
    $totalX = calculateTotalX($giantDataSet);
    $averageX = calculateAvergageX($giantDataSet);
    $totalY = calculateTotalY($giantDataSet);
    $averageY = calculateAverageY($giantDataSet);
    // etc.

    Inside all of those calculating functions was a loop, so every calculation was another walk through the entire dataset. Meanwhile each calculation was more like a filter, doing basic math and happy to work with a row at a time.

    My feeling was that if the author had been thinking in terms of useful abstractions, the inefficiency of looping through the entire dataset each time would have stood out. Instead they’d been stuck knee deep in the code too long to be able to see the bigger picture.

    The modified version became something like this, performing the complete analysis against a single loop;

    class Analyser {
        // stuff here...
        // Process the data something like this...
        function analyse($data) {
            foreach ( $data as $row ) {
                foreach ( array_keys($this->filters) as $key ) {
    // Usage something like this;
    $A = & new Analyser();
    $A->addFilter(new TotalXFilter());
    $A->addFilter(new AverageXFilter());
    $A->addFilter(new TotalYFilter());
    $A->addFilter(new AverageYFilter());

    You might argue that’s a developer problem but another example I ran into recently makes me think there are other categories of problems where procedural code, when written by a human (vs. generated), will always produce poor performing results.

    I basically started looking at what would be needed to get Dokuwiki’s parser to the point of being able to handle UTF-8 encoded text. In the process I have now more or less re-written the parser using the lexer from Simple Test (actually a slightly modified version to take UTF-8 into account).

    In short, Simple Test’s lexer acts as a tool to make regular expressions easy to manage – rather than giant regexes you write many small / simple ones. The lexer takes care of combining them efficiently then gives you a SAX-like callback API to allow you to write code to respond to matched “events”.

    The surprising result, to me, has been the dramatic performance increase. Parsing the wiki:syntax source with Dokuwiki’s native parser, on my box, takes anything between 5 to 7 seconds. Using the parser based on Simple Test’s lexer, it’s taking between 0.2 and 0.25 seconds.

    What seems to be causing this difference is Dokuwiki’s parser is scanning the complete raw text multiple times, replacing wiki syntax with HTML as it goes. There’s at least 18 scans happening on the entire source document. Meanwhile Simple Test’s Lexer scans the entire document only once.

    And this is by no means a criticism of the Dokuwiki’s author. I certainly couldn’t do any better using a similar approach. Being a mere human being, the easiest approach for me is writing code which performs multiple scans of the source – there’s no way my brain can scale to combining everything Dokuwiki’s parser does into a single regex.

    Extending parsing discussion further, a similar story seems to be told by Piccolo;

    Piccolo is a small, extremely fast XML parser for Java

    Piccolo was developed using modified versions of the parser generator tools JFlex and BYACC/J … I noticed that almost all Java XML parsers are hand-written

    What seems to be the bottom line is not that “those that GROK OOP are better coders”. Rather there are situations where for a human (any human) to write code that performs well, abstractions from raw procedures are required. OOP is one possible way to achieve abstraction and it may help you develop more efficient solutions. Certainly the notion “It’s OOP so must be slower” is superficial.

    Anyway – returning to the Dokuwiki parser, there’s still some work to do.

    One downside of Simple Test’s Lexer (unless I change it) is you can’t use subpatterns inside the regexes you provide (it escapes them). Right now that’s making finding the end of a list difficult and I haven’t got the “leading space” nonparsed blocks figured out yet.

    I also need to check that it’s really handling UTF-8: at the moment I have added the /u modifier to the preg_match() call it makes but there’s also some use of the str* functions in there, for extract sub strings – I may be able to work around that using preg_split() and the /u modifier.

    It should all go well though, aside from performance, another positive result will be that the output format is well separated from the parsing. That means it should be possible to render alternative output formats from the current HTML – “simply” plugging in a class containing the right method names – something like;

    class Plaintext_Renderer {
        var $indent = 0;
        // Called on dokuwiki headers
        function header($level, $text) {
            $this->indent = $level;
            // Never mind the UTF-8...
            fwrite(STDOUT, strtoupper($text));
        // Normal text
        function cdata($text) {
                str_pad($text, $this->indent, " ", STR_PAD_LEFT)
        // etc.

    And a nice coincidence is wikipedia uses almost exactly the same markup as Dokuwiki (I believe Dokuwiki is based, in parts, on Mediawiki).

    Source (in progress and not great) is here along with tests.