Relational databases and OOP

In PHP & MySQL: Novice to Ninja, the authors say (at the beginning of Chapter 13):

“… JOINs do not work well with object-oriented programming. The relational approach used by databases is generally incompatible with the nested structure of object-oriented programming. In object-oriented programming, objects are stored in a hierarchical structure.”

I get the difference between relation and encapsulation at a semantic level. But, I have a coding task currently that stretches the idea that encapsulation is better and I’m hoping someone could offer some code examples to show what the authors mean.

I need to pull data from a database at four levels. It’s complicated but, in a simplified, fictional scenario, this will do to explain it.

Imagine we are working with films. I have:

Genres - Level 1
Films - Level 2
Actors - Level 3
Awards - Level 4

So we could say that “Horror” is a genre which encapsulates the film, “The Shining”, which encapsulates the actor, “Jack Nicholson”, who encapsulates the award “Oscar”.

With relation programming I could just have a JOIN-fest and deal with this quite easily. And, using OOP, if I only had two levels, I could, say, have a “Genre” entity containing a getFilm() method. Again fairly easy. But, what if I want to start displaying all four levels on a page using OOP?

Could somebody visualise this in concepts and code for me in an OOP way, please? Some examples of the foreach logic?

Cheers.
Mike

Things like these can’t be captured in OOP, that’s the whole point the book is trying to make.

What Doctrine (a well known ORM for PHP) does is it uses so called proxies, so for example your Genre class might have a method getFilms, but the films aren’t loaded in the class yet. However, when you call that method it will do a database call and retrieve the films. Those films will then have method getActors which, you guessed it, will load all actors when the method is called. Etc.

This works quite well, but doesn’t scale. When a project gets big enough this kind of magic will start to work against you instead of help you. What you should do instead is don’t nest the objects, but just keep the ID in the object, and then request the related object from the database again. Sure, it’s more queries, but you get much more fine grained control over what’s happening and there is no magic involved.

Thank you for your response.

I don’t agree that…

… the book is simply making the point that OOP works differently, objects “contain” rather than “relate to”, but it doesn’t mention “levels” of encapsulation beyond two. And this is my problem: the three, four…

Could you provide some simple code examples of what you mean here, please?

The trap that many developers fall into is the desire to create a “one size fits all” solution. They see the problem domain and immediately start to create objects for Genre, Film,Actor,Award and then try to link them together. And it often works. For one request. They come up with a page the displays a particular set of information.

But then they move onto a second request which needs to display different information. And their wonderful model and queries start to break down. The relationship linkage is reversed. Maybe they only need parts of data from one object. Maybe they need to do some calculations. Very sad.

And even if you manage to someone process the second request using your original objects, then along comes the third request and things just blow up again.

For me, the approach is to decide exactly what information is needed to satisfy a given request and write a query accordingly. A join fest specific to the particular request. And then, if necessary, convert the results of the query into request specific custom objects to make manipulating results easier.

And repeat for each type pf request.

1 Like

ahundiak,

Thank you for your response.

This makes sense to me. The last point in particular:

… it is definitely useful to be able just to pass results into templates, for which we do need objects not arrays.

The topic is really interesting to me. I’m beginning to evaluate the worth of frameworks: I still think they’re worthwhile (especially when you know them inside out and don’t just use a packaged one) but, yes, major limitations, it seems.

Mike

Hi Mike,

Two levels, ten levels, it’s the same because a relationship exists between any of the two levels.

If you follow the process outlined in the book you can do something like:


foreach ($genres as $genre) {
   echo $genre->name;
   foreach ($genre->getFilms() as $film) {
       echo $film->title;
        foreach ($film->getActors() as $actor) {
             echo $actor->name;
             foreach ($actor->getAwards() as $award) {
                    echo $award->title;
             }
       }
    }

}

This is more verbose than a join and potentially slower because you are sending 4 queries to the database (though in some situations such as sorting, multiple queries can be faster than joins). However, the benefit of OOP is code reuse and flexibility. Given any Film object you can get a list of actors using the code $film->getActors() and then get a list of that actor’s awards. If you’re manually writing queries, you have to manually write a query each time you wish to.

edit: As an example, you can easily extend this system to allow:


//Get all films starring an actor
$actor->getFilms();

//Or get all the actors in a film
$film->getActors();


// Get all films that an  anctor won an award for:
	foreach ($actor->getAwards() as $award) {
		echo $award->getFilm()->title;
	}

//Which could even be abstracted into the `Actor` class:
$actor->getAwardWinningFilms();

///--------------
class Actor {
	public $name;
	public $birthDate;

	private $filmTable;
	private $awardsTable;

	public function __construct(DatabaseTable $filmTable, DatabaseTable $awardsTable) {
		$this->filmTable = $filmTable;
		$this->awardsTable;
	}

	public function getAwardWinningFilms() {
		$awardFilms = [];
		foreach ($this->getAwards() as $award) {
			$awardFilms[] = $award->getFilm();
		}

		return $awardFilms;
	}

	public function getAwards() {
		return $this->awardsTable->find('actorId', $this->id);
	}
}

//--------------

//Get all the genres that an actor has worked in:

foreach ($actor->getFilms() as $film) {
	echo $film->getGenre()->name;
	//Or store them in an array to reduce duplicates...
}

This approach gives a level of code reuse that’s just not possible using queries all over the place. You can define the relationship once in the relevant entity class and use it anywhere in your application.

The other benefit is that your relations don’t have to be between database tables:

$actor->getTwitterFeed()->getLatestTweet()

Where getTwitterFeed uses the Twitter API to get the actor’s latest tweets.

1 Like

Hey, Tom.
Thank you for your response. Really useful, as ever.

… so, basically,

  1. your advice is to not use SQL JOINS… even if it means creating entity classes for every table involved, classes that only have one method in each as, in the long run, code re-use will compensate for this initial coding? And
  2. just a question: you’ve instilled in me (rightly so, and thank you) a belief that “repetition is bad”. The quoted code above shouts out repetition (if in “logic” only). I haven’t used “Doctrine”, but if you look at ScallioXTX’s comment in this thread, are these “proxies” basically an answer to this repetition and, if “Yes”, what do YOU think about these “proxies”? Any good?

Mike

Yes. There are ways around that. Really the book is just to teach the underlying concepts. You may prefer to use a proper framework or ORM. The ORM I get you to build in Novice To Ninja is essentially a heavily stripped down version of Maphper https://github.com/Level-2/Maphper which doesn’t require entity classes for each table.

Proxy objects exists for performance reasons. (Though depending on use-case can be slower) In the code I gave you above if you just wanted a list of awards won for the film (and didn’t care who won them you could use the code:

$film = $filmTable->find('id', 123);
foreach ($film->getActors() as $actor) {
             foreach ($actor->getAwards() as $award) {
                    echo $award->title;
             }
}

Here, you’re selecting all the actors and all the awards for those actors. You’re never actually using any information from the actors table. A proxy object would mitigate this by only fetching data from the database when it’s required.

Of course that can be slower.

If you run the query

foreach ($actor->getAwards() as $award) {

}

Whatever creates the $award variable has no way to know whether you’ll need the data or not so can run SELECT * FROM award WHERE id = $this->id on the instruction echo $award->title.

Of course this can be too smart for its own good, this causes a problem because if it’s in a loop each time you loop over an award it sends a new SELECT query. This can be mitigated (As soon as one title is fetched, fetch all of them that might be needed) but the complexity keeps increasing.

2 Likes

Thanks for your reply, Tom.

I’ve got you. I will look at the other framework you based this on. Right now though, as I’m still learning, I’ll stick with yours and build it out as I learn more, rather than starting with something complete but alien to me.

Avoiding proxies for the time being.

Mike

My current take is a bit different from those presented here. While I used to like having hierarchical data in objects with methods like $genre->getFilms(), $film->getActors(), etc. I found out that as projects increase in complexity then then these methods are too simple so I need to add complexity to them like options regarding sorting, limits & offsets, conditions, sometimes additional columns, sql functions, etc. Then this starts to look like a more and more complicated ORM so I might just as well use a huge monster like Doctrine ORM, which has its own quirks and disadvantages. I find it much simpler to just write plain SQL query that gets the data I need and encapsulate it in a class. Then adding conditions to SQL or some other tweaks is incredibly simple. As an added bonus I get the ease of tweaking my queries for best performance.

For instance, if I have an info page where I want to display genres with films with actors with awards then I’d have a class method getGenres() in an InfoPage class and the method would return an array of arrays or objects, 4-level deep with the data required for the info page. In the implementation I’d possibly use a single query or maybe multiple queries depending on the database, possibilities, performance, etc.

In other words I don’t find much value in mixing data with behaviour in objects. I find it simpler to have dumb data structures and use OOP for behaviour. Then I just use the data like this:

foreach ($genres as $genre) {
   echo $genre->name;
   foreach ($genre->films as $film) {
       echo $film->title;
        foreach ($film->actors as $actor) {
             echo $actor->name;
             foreach ($actor->awards as $award) {
                    echo $award->title;
             }
        }
    }
}

or even the equivalent of the above with array syntax. I fetch all of the above data in one place - one method or one class so it’s easy to do it and to apply modifications at any time later on.

I’m sure it depends a lot on individual preferences, mine are probably biased because most often I don’t deal with simple web sites but with systems based on complex database relations and I need to use a lot of custom and elaborated SQL queries to get the job done so in such cases trying to translate all the relational logic into objects is too much hassle.

So that’s how I deal with object-relational mismatch: I don’t even try to translate one to the other, I simply use SQL to fetch and update data so that I use the full power of relational database and then I use OOP for my application and deal with the data in OO code - but the data itself is not OO.

I think this is similar to what @ahundiak had in mind in his comment here, if I understood him correctly.

I’ve found an interesting article related to this: There is no Such Thing as Object-Relational Impedance Mismatch. Interestingly, at the end it mentions a poorly supported MULTISET in RDBMS but there is a pretty good equivalent in PostgreSQL - I now realise why I like working with PostgreSQL: to retrieve a 4-level deep data structure like this can be easily done in one efficient query - the database just returns a nested result structure in JSON format. Then I don’t have to deal with doing it with OOP, ORM, whatever - I just get the data I need straight from the database.

2 Likes

Lemon_Juice,

Thank you so much for your detailed reply.

Can you elaborate on this. Currently, I am using quite a basic ORM in which I have functions that would normally return arrays but which I am forcing to return classes like so:

public function findById($value)
    {
        $query = 'SELECT * FROM `'.$this->table.'` WHERE `'.$this->primaryKey.'` = :value';
        $parameters = [
            'value' => $value,
        ];
        $query = $this->query($query, $parameters);

        return $query->fetchObject($this->className, $this->constructorArgs);
    }

Here, “className” is the name of an entity class and “constructorArgs” will be table-specific instances of the ORM, like so:

<?php

namespace Ibm\Entity;

use Framework\DatabaseTable;

class Category
{
    public $id;
    public $name;

    private $jokesTable;
    private $jokeCategoriesTable;

    public function __construct(DatabaseTable $jokesTable, DatabaseTable $jokeCategoriesTable)
    {
        $this->jokesTable = $jokesTable;
        $this->jokeCategoriesTable = $jokeCategoriesTable;
    }

How, exactly, are you bridging the gap between array and object? I can see that your nested foreach is using object notation.

A most basic example, I have a class that I use for fetching data for one of information web pages:

class InfoPage {
    private $db;
    
    public function __construct($db) {
        $this->db = $db;
    }
    
    public function getPage($slug) {
        return $this->db->fetchAssoc("SELECT title, description, content
            FROM pages
            WHERE id=?",
            [$slug]);
    }
}

In this example I get data of the page in an array form. I don’t have an ORM nor an entity class. I just have a specific class that deals with fetching data for a web page. This is very simple but of course I can make it much more elaborate, if required, I use complex SQL with joins, functions, etc. depending on my needs.

As I mentioned earlier, object notation was only an example, I could do the same thing using arrays and then the nested loop would be identical except for array syntax.

As I see it there are 3 major forms of fetching data by data access objects:

  1. As array.
  2. As dumb object (of type stdClass).
  3. As specific objects, i.e. entities.

In many cases there is not much difference between 1 and 2 except for different syntax. In some cases I even prefer arrays because I can use a whole set of PHP array functions on them while not on objects. Arrays are really fine for dumb data structures.

Form 3 is also a good choice but you have to write your entity class, of course. This is more work but the bonus is that your entity class can be documented well and you can give it a ‘signature’ by specifying all fields (properties) and perhaps getters and setters. Then when you pass an entity object like this in your code you know what kind of data it encapsulates, what fields are available, you get IDE hints with your PhpDoc blocks, etc. Whereas when passing arrays or stdClass objects you don’t know what kind of data they hold, which can make the code less clear.

To be honest, I often use arrays because it takes less effort - but I’m aware that in case of debugging I need to carefully track down where the data comes from and what’s in there. This is a trade-off. I use entities mostly in cases where I write a module (a class, or a collection of classes) that I expect to be separate and to (be able to) re-use it later on and I want it to have a well defined interface. When you see that a class method requires a PageInfo entity object then you can easily know what data it needs by looking at PageInfo class, whereas when an array is required then you don’t know what the array is supposed to hold without looking into the module’s code (or without reading the documentation, if there is any). The same applies to data returned by the class methods.

When a collection of data is returned then I return an array of one of the same data types:

  1. an array of arrays
  2. an array of stdClass objects
  3. an array of entity objects

You can also have a variation of the array by returning an object implementing ArrayIterator - but we don’t have to go into such details here. Such an object will behave almost like an array.

If there are nested structures like in your example then there will be multidimensional array of arrays or objects.

The difference between your method and mine is that you are using generic methods like findById() that are used for all your tables/entities. I’ve found this works well for simple CRUD applications because you can cover almost everything by using those generic methods. When the requirements increase those methods often will not be enough and we will have to code around them.

For example, imagine you are using findById() to fetch page data for display. But one day you want to fetch just one more value - number of comments for that page and the most efficient way is to use a subquery in your SQL. I just add a subquery in half a minute and I’m done while you have to completely change your approach because findById() can’t do it for you. This may not sound like an ORM solution would be very hard to find just to fetch comment count - I’m aware there are ways to do it - but there may be a hundred other non-CRUD-standard requirements and in some cases this may become tedious.

And BTW, I don’t mean to say which way is better, I just want to present a different take on the matter. Some people simply enjoy as much OOP as possible and enjoy abstracting things like database access, etc. while others feel more comfortable with SQL, and of course depending on project type one approach may be more reasonable than the other. But personally, one day I decided to go ORM-less for my new projects and I really can’t look back - I just feel relieved :slight_smile:

1 Like

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.