OOP and Performance

Tweet

Object oriented programming, compared to procedural is typically seen as a trade off: increased “developer performance” through better modularity / re-use vs. slower processing, based on the extra runtime lookup overhead objects introduce (compared to the equivalent collection of functions + variables).

Search Google for “php oop” and this is the first result, which is fairly well balanced but comes to the safe conclusion;

the next time you are developing PHP code, you should consider whether you want faster execution times / less CPU load, or easier to maintain code

The argument is based on benchmarking code. What it ignores though is the human aspect of writing code and that’s where a side-effect of OOP can result in improved performance.

Bearing in mind I’m talking only in a general sense, I think the tendency with procedural code is to use “brute force” – the actual meaning and behaviour of the logic being masked by the spaghetti. If what the code was actually doing was transparent to the developer, they might see some of the “blinding” inefficiencies they’ve introduced.

By way of anecdote, a while back I was asked to help with a statistical analysis tool, written in PHP, that performed calculations on a giant data set. PHP, the server and the Pope had already been blamed for its atrocious performance.

Written procedurally, what no-one had got round to blaming was logic that basically boiled down to this (but it wasn’t so easy to see, being scattered across multiple files);


$giantDataSet = getGiantDatasetFromSomewhere();

$totalX = calculateTotalX($giantDataSet);

$averageX = calculateAvergageX($giantDataSet);

$totalY = calculateTotalY($giantDataSet);

$averageY = calculateAverageY($giantDataSet);

// etc.

Inside all of those calculating functions was a loop, so every calculation was another walk through the entire dataset. Meanwhile each calculation was more like a filter, doing basic math and happy to work with a row at a time.

My feeling was that if the author had been thinking in terms of useful abstractions, the inefficiency of looping through the entire dataset each time would have stood out. Instead they’d been stuck knee deep in the code too long to be able to see the bigger picture.

The modified version became something like this, performing the complete analysis against a single loop;


class Analyser {

    // stuff here...
    
    // Process the data something like this...
    function analyse($data) {
        foreach ( $data as $row ) {
            
            foreach ( array_keys($this->filters) as $key ) {
            
                $this->filters[$key]->filter($row);
                
            }
            
        }
    }
    
}

// Usage something like this;
$A = & new Analyser();

$A->addFilter(new TotalXFilter());
$A->addFilter(new AverageXFilter());
$A->addFilter(new TotalYFilter());
$A->addFilter(new AverageYFilter());

$A->analyse(getGiantDatasetFromSomewhere());

You might argue that’s a developer problem but another example I ran into recently makes me think there are other categories of problems where procedural code, when written by a human (vs. generated), will always produce poor performing results.

I basically started looking at what would be needed to get Dokuwiki’s parser to the point of being able to handle UTF-8 encoded text. In the process I have now more or less re-written the parser using the lexer from Simple Test (actually a slightly modified version to take UTF-8 into account).

In short, Simple Test’s lexer acts as a tool to make regular expressions easy to manage – rather than giant regexes you write many small / simple ones. The lexer takes care of combining them efficiently then gives you a SAX-like callback API to allow you to write code to respond to matched “events”.

The surprising result, to me, has been the dramatic performance increase. Parsing the wiki:syntax source with Dokuwiki’s native parser, on my box, takes anything between 5 to 7 seconds. Using the parser based on Simple Test’s lexer, it’s taking between 0.2 and 0.25 seconds.

What seems to be causing this difference is Dokuwiki’s parser is scanning the complete raw text multiple times, replacing wiki syntax with HTML as it goes. There’s at least 18 scans happening on the entire source document. Meanwhile Simple Test’s Lexer scans the entire document only once.

And this is by no means a criticism of the Dokuwiki’s author. I certainly couldn’t do any better using a similar approach. Being a mere human being, the easiest approach for me is writing code which performs multiple scans of the source – there’s no way my brain can scale to combining everything Dokuwiki’s parser does into a single regex.

Extending parsing discussion further, a similar story seems to be told by Piccolo;

Piccolo is a small, extremely fast XML parser for Java

Piccolo was developed using modified versions of the parser generator tools JFlex and BYACC/J … I noticed that almost all Java XML parsers are hand-written

What seems to be the bottom line is not that “those that GROK OOP are better coders”. Rather there are situations where for a human (any human) to write code that performs well, abstractions from raw procedures are required. OOP is one possible way to achieve abstraction and it may help you develop more efficient solutions. Certainly the notion “It’s OOP so must be slower” is superficial.

Anyway – returning to the Dokuwiki parser, there’s still some work to do.

One downside of Simple Test’s Lexer (unless I change it) is you can’t use subpatterns inside the regexes you provide (it escapes them). Right now that’s making finding the end of a list difficult and I haven’t got the “leading space” nonparsed blocks figured out yet.

I also need to check that it’s really handling UTF-8: at the moment I have added the /u modifier to the preg_match() call it makes but there’s also some use of the str* functions in there, for extract sub strings – I may be able to work around that using preg_split() and the /u modifier.

It should all go well though, aside from performance, another positive result will be that the output format is well separated from the parsing. That means it should be possible to render alternative output formats from the current HTML – “simply” plugging in a class containing the right method names – something like;


class Plaintext_Renderer {

    var $indent = 0;
    
    // Called on dokuwiki headers
    function header($level, $text) {
        $this->indent = $level;
        
        // Never mind the UTF-8...
        fwrite(STDOUT, strtoupper($text));
    }
    
    // Normal text
    function cdata($text) {
        fwrite(
            STDOUT,
            str_pad($text, $this->indent, " ", STR_PAD_LEFT)
            );
    }
    
    // etc.
}

And a nice coincidence is wikipedia uses almost exactly the same markup as Dokuwiki (I believe Dokuwiki is based, in parts, on Mediawiki).

Source (in progress and not great) is here along with tests.

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • Ren

    Would be nice for a PHP flavoured version of re2c. Unfortunately made more difficult by the absence of a goto in PHP still, which I personally dont understand why it is such an issue, as it makes code generators a little simpler to implement/port.

  • http://www.phppatterns.com HarryF

    You sparked an interesting trailing of searching, by mentioning re2c, which end up on a different tack: ASPA – this is another ASP (VBScript / JScript) to PHP converter but looks to be pretty powerful – see the example the author provides here.

  • http://www.hands-solutions.com wont

    Was thinking when you talked about the whole dataset example that, if I were to do it procedurally, I would have passed in by reference all the desired results variables in the parameters to one function call, and done the calculations in one loop, similarly to how you did it in the OOP code. This is what I did on one project I was working on a few years back that had to handle multiple similar calculations on a large dataset. My original code had broken it out, but performance was horrible. After refactoring it, I got a sizable increase in processing speed.

  • Christopher Thompson

    I think articles like this almost do OOP a disservice. You are comparing algorithms to methodologies in most of your cases which makes any conclusion impossible. Issues like abstraction, single vs multi pass, code comprehendablity, etc. are all issues of programming in general. But they often seem to slip into pro-OOP writing as unique to OOP.

    Somebody writes an informative article that basically says “even though OOP code executes slower you should use it anyway” and OOPifiles get all bent out of shape.

    Otherwise this was an interesting post on performance tuning.

  • http://www.phppatterns.com HarryF

    You are comparing algorithms to methodologies in most of your cases which makes any conclusion impossible.

    Agreed, especially with”makes any conclusion impossible”. Keeping focused on PHP, the assumption that “OO => slow app” is particularily acute, which I was attempting to question. Or perhaps what I was challenging was the reverse: procedural “PHP => fast app”.

    Talking purely in terms of end results, the untestable experiment I’m trying to suggest is if we gave two, equally talented, developers a problem to solve of significant complexity using PHP, the developer that applies OO is more likely to deliver something that performs well and scales consistently. And that’s not a direct result of OO directly but rather “side effects” that arise from the human side of development.

  • arborint

    I’m trying to suggest is if we gave two, equally talented, developers a problem to solve of significant complexity using PHP, the developer that applies OO is more likely to deliver something that performs well and scales consistently

    First, I seriously doubt that there would be much difference the performance or scalability either direction, given two equally talented developers.

    However there is one thing about those two apps that the pro-OOP crowd never like to admit. That is that if you gave the code to a sampling of PHP programmers, the vast majority would go for the procedural code because it is simply easier for most PHP programmers to understand. And that’s why there are so many successful PHP apps with procedural code.

    I think the one problem with OOP that never gets mentioned is that it is a methodology driven by consultants who make their living on “the new.” You only need to read the SitePoint forums to follow the fads: Use inheritance, no it’s bad now. Use get/set, no don’t use them any more. Nouns or verbs, who knows which is best yet? Not clear on MVC, that’s OK because it is so vague that it can be neither accepted or refuted, only discussed endlessly.

    The main problem that those few who chose the OOP version of your

  • http://mgaps.highsidecafe.com BDKR

    In fact, I think most non-beginner PHP programmers seem to understand that it is things like algorithm choice…

    I agree and really think this is more the heart of the issue. Your two talented developers may in fact end up using a very similar algorithm.

    So what is it about OO that may cause one to come up with a better algorithm (single-pass as opposed to multi-pass in this case)?

  • arborint

    Well, you would have to prove it was true before you could find a cause.

  • RyanW

    Two equally talented developers are unlikely to deliver greatly different results. This is assuming they are both aware of performance issues, and good writing good, intelligent algorithms.

    In general, OOP may run better – usually because the person(s) who take the time to become good at OOP are more interested in smart software development practices, re-use (for them and others) and simply intelligent coding practices and methods. However, a person may choose procedual programming even if they are fully capable of writing object oriented applications – most likely, they will write a faster procedual application than object oriented. Procedual programming is not invalid and can in many cases suit a particular project better than object orientation can – it all really depends on the size of the application, it’s purpose and it’s need to be maintained and updated.

  • jgchristopher

    I have to be upfront and say that I advocate the use of OOP. I do use procedural style programming in PHP though, when I am either testing out an idea or doing a quick and dirty prototype. I have found in my own experiences, that OOP designs/implementations are easier to follow and extend when the time comes. Of course, OOP can be overkill for small projects that are unlikely to need to scale or change in the future.

  • Buddha443556

    IMHO neither procedural or OOP methodologies will perform well if the programmer ignores the underlining architecture. A web application coded in PHP operates in a very different environment than a standalone desktop application coded in any language.

    Personally I find OOP overkill when it comes to PHP web programming. However, this probably comes from my one page at time approach to PHP web programming.

  • Anonymous

    The big data example could be written as one loop procedurally:

    while(row = getNext(recordSet)) {
    doSomething(…);
    doSomethingElse(…);
    Etc…
    }

    If you want dynamic function assignment, then perhaps put function pointers into an array. But at this point it is hard to say without seeing the requirements.

  • Anonymous

    Here is another procedural variation of the “big data” example:

    $options = array();
    $options['foo'] = true;
    $options['bar'] = false;
    etc…
    // or get options from a database

    function bigLoop($recordSet, $options) {
    while ($row = getNext($recordSet)) {
    if($options['foo']) {…do foo stuff…}
    if($options['bar']) {…do bar stuff…}
    etc…
    }
    }

  • Pingback: SitePoint Blogs » Brion Vibber on Wikipedia and Mediawiki

  • Martin

    Good input!

    I think the biggest problem with the procedural way is that it can get so messy – where’s that variable, did I declare that, where was that again…Maybe I am just a mess, but OOP provides nice abstraction and handling of data AND is neat and tidy. You can say that you have an elephants memory and all, keeping multiple pages in your head but come back a week or a month later and it be all scrambled for you. OOP on the other hand will get you working in no time. Basically what it boils down to is that human beings are OOP not procedural and so it goes.

    Of course there’s the bad messy programmer and good tidy programmer and then there’s Murphy’s Law.