OOP and Performance

Object oriented programming, compared to procedural is typically seen as a trade off: increased “developer performance” through better modularity / re-use vs. slower processing, based on the extra runtime lookup overhead objects introduce (compared to the equivalent collection of functions + variables).

Search Google for “php oop” and this is the first result, which is fairly well balanced but comes to the safe conclusion;

the next time you are developing PHP code, you should consider whether you want faster execution times / less CPU load, or easier to maintain code

The argument is based on benchmarking code. What it ignores though is the human aspect of writing code and that’s where a side-effect of OOP can result in improved performance.

Bearing in mind I’m talking only in a general sense, I think the tendency with procedural code is to use “brute force” – the actual meaning and behaviour of the logic being masked by the spaghetti. If what the code was actually doing was transparent to the developer, they might see some of the “blinding” inefficiencies they’ve introduced.

By way of anecdote, a while back I was asked to help with a statistical analysis tool, written in PHP, that performed calculations on a giant data set. PHP, the server and the Pope had already been blamed for its atrocious performance.

Written procedurally, what no-one had got round to blaming was logic that basically boiled down to this (but it wasn’t so easy to see, being scattered across multiple files);


$giantDataSet = getGiantDatasetFromSomewhere();

$totalX = calculateTotalX($giantDataSet);

$averageX = calculateAvergageX($giantDataSet);

$totalY = calculateTotalY($giantDataSet);

$averageY = calculateAverageY($giantDataSet);

// etc.

Inside all of those calculating functions was a loop, so every calculation was another walk through the entire dataset. Meanwhile each calculation was more like a filter, doing basic math and happy to work with a row at a time.

My feeling was that if the author had been thinking in terms of useful abstractions, the inefficiency of looping through the entire dataset each time would have stood out. Instead they’d been stuck knee deep in the code too long to be able to see the bigger picture.

The modified version became something like this, performing the complete analysis against a single loop;


class Analyser {

    // stuff here...
    
    // Process the data something like this...
    function analyse($data) {
        foreach ( $data as $row ) {
            
            foreach ( array_keys($this->filters) as $key ) {
            
                $this->filters[$key]->filter($row);
                
            }
            
        }
    }
    
}

// Usage something like this;
$A = & new Analyser();

$A->addFilter(new TotalXFilter());
$A->addFilter(new AverageXFilter());
$A->addFilter(new TotalYFilter());
$A->addFilter(new AverageYFilter());

$A->analyse(getGiantDatasetFromSomewhere());

You might argue that’s a developer problem but another example I ran into recently makes me think there are other categories of problems where procedural code, when written by a human (vs. generated), will always produce poor performing results.

I basically started looking at what would be needed to get Dokuwiki’s parser to the point of being able to handle UTF-8 encoded text. In the process I have now more or less re-written the parser using the lexer from Simple Test (actually a slightly modified version to take UTF-8 into account).

In short, Simple Test’s lexer acts as a tool to make regular expressions easy to manage – rather than giant regexes you write many small / simple ones. The lexer takes care of combining them efficiently then gives you a SAX-like callback API to allow you to write code to respond to matched “events”.

The surprising result, to me, has been the dramatic performance increase. Parsing the wiki:syntax source with Dokuwiki’s native parser, on my box, takes anything between 5 to 7 seconds. Using the parser based on Simple Test’s lexer, it’s taking between 0.2 and 0.25 seconds.

What seems to be causing this difference is Dokuwiki’s parser is scanning the complete raw text multiple times, replacing wiki syntax with HTML as it goes. There’s at least 18 scans happening on the entire source document. Meanwhile Simple Test’s Lexer scans the entire document only once.

And this is by no means a criticism of the Dokuwiki’s author. I certainly couldn’t do any better using a similar approach. Being a mere human being, the easiest approach for me is writing code which performs multiple scans of the source – there’s no way my brain can scale to combining everything Dokuwiki’s parser does into a single regex.

Extending parsing discussion further, a similar story seems to be told by Piccolo;

Piccolo is a small, extremely fast XML parser for Java

Piccolo was developed using modified versions of the parser generator tools JFlex and BYACC/J … I noticed that almost all Java XML parsers are hand-written

What seems to be the bottom line is not that “those that GROK OOP are better coders”. Rather there are situations where for a human (any human) to write code that performs well, abstractions from raw procedures are required. OOP is one possible way to achieve abstraction and it may help you develop more efficient solutions. Certainly the notion “It’s OOP so must be slower” is superficial.

Anyway – returning to the Dokuwiki parser, there’s still some work to do.

One downside of Simple Test’s Lexer (unless I change it) is you can’t use subpatterns inside the regexes you provide (it escapes them). Right now that’s making finding the end of a list difficult and I haven’t got the “leading space” nonparsed blocks figured out yet.

I also need to check that it’s really handling UTF-8: at the moment I have added the /u modifier to the preg_match() call it makes but there’s also some use of the str* functions in there, for extract sub strings – I may be able to work around that using preg_split() and the /u modifier.

It should all go well though, aside from performance, another positive result will be that the output format is well separated from the parsing. That means it should be possible to render alternative output formats from the current HTML – “simply” plugging in a class containing the right method names – something like;


class Plaintext_Renderer {

    var $indent = 0;
    
    // Called on dokuwiki headers
    function header($level, $text) {
        $this->indent = $level;
        
        // Never mind the UTF-8...
        fwrite(STDOUT, strtoupper($text));
    }
    
    // Normal text
    function cdata($text) {
        fwrite(
            STDOUT,
            str_pad($text, $this->indent, " ", STR_PAD_LEFT)
            );
    }
    
    // etc.
}

And a nice coincidence is wikipedia uses almost exactly the same markup as Dokuwiki (I believe Dokuwiki is based, in parts, on Mediawiki).

Source (in progress and not great) is here along with tests.

Frequently Asked Questions about Object-Oriented Programming and Performance

What is the main difference between Object-Oriented Programming (OOP) and Procedural Programming?

Object-Oriented Programming (OOP) and Procedural Programming are two different programming paradigms. The main difference between them lies in the way they structure and manipulate data. In Procedural Programming, the focus is on the sequence of actions or procedures to be performed. On the other hand, OOP is centered around objects, which are instances of classes, and these objects interact with each other to perform operations.

When should I use OOP over Procedural Programming?

OOP is generally used when you have complex systems that require a high level of abstraction. It’s ideal for large software development projects where multiple developers are working simultaneously. OOP allows for better data encapsulation, inheritance, and polymorphism, which can make the code more readable, reusable, and easier to maintain. However, for simpler, smaller-scale projects, Procedural Programming might be more suitable.

Does OOP affect the performance of my code?

The performance of your code can be influenced by many factors, and the programming paradigm is just one of them. While OOP can sometimes lead to slower performance due to the overhead of objects, the difference is usually negligible in modern computing environments. The key to good performance is often more about good design and efficient algorithms rather than the choice between OOP and Procedural Programming.

What are the benefits of using OOP?

OOP offers several benefits. It provides a clear structure for the code, making it easier to understand and maintain. It also promotes code reusability through inheritance and polymorphism. Moreover, OOP allows for data encapsulation, which enhances security by hiding the internal state of objects.

Can I mix OOP and Procedural Programming in the same project?

Yes, it’s possible to mix OOP and Procedural Programming in the same project. This is known as multi-paradigm programming. The choice between OOP and Procedural Programming doesn’t have to be an all-or-nothing decision. You can use the best aspects of both paradigms to suit your specific needs.

Is OOP more difficult to learn than Procedural Programming?

OOP can be more challenging to learn initially because it requires a different way of thinking about how to structure and manipulate data. However, once you understand the concepts of classes, objects, inheritance, and polymorphism, you might find that OOP makes your code more intuitive and easier to manage.

What are some common misconceptions about OOP?

Some common misconceptions about OOP include the belief that it’s always the best choice for every project, that it always leads to better code organization, or that it always improves performance. While OOP has many benefits, it’s not always the best fit for every situation. The choice between OOP and Procedural Programming should be based on the specific needs of your project.

How does OOP contribute to code reusability?

OOP promotes code reusability through the concept of inheritance, where a new class can inherit the properties and methods of an existing class. This means that you can create a general class with common functionality and then create more specific classes that inherit from it, reducing the need to write the same code multiple times.

How does OOP enhance the security of my code?

OOP enhances the security of your code through the concept of data encapsulation. This means that the internal state of an object is hidden from the outside world, and it can only be accessed or modified through the object’s methods. This prevents unauthorized access and manipulation of the data.

What are some good practices when using OOP?

Some good practices when using OOP include designing classes around real-world objects or concepts, keeping classes and methods small and focused, using inheritance and polymorphism to promote code reusability, and using data encapsulation to protect the internal state of objects.