Large applications without ORM?

I’m wondering - is it problematic to have a large application written in OOP without any ORM? I need to decide what kind of database access to implement in my next project and I’m undecided like never before :smile: I used to use a my own home-made ORM, which I liked a lot, it had a code generator from database schema so there was almost no need for any configuration and I got all IDE hints for field names along with column comments from the database. However, this followed the ActiveRecord pattern and the entities were mixed with database access methods and all of that was hard to test, and also I often ended up with huge objects that did far too much.

I am considering a data mapper now but none of the available ones fits my needs. I want something simple and lightweight that allows me to easily develop starting with the database design (as opposed to the php model that is later ‘migrated’), can be easily worked with plain SQL and has not too much overhead. I don’t care about database independence and don’t need too much abstraction. I even thought a minimalist mapping like the one in mikron would be enough for me…

Which led me to a conclusion that perhaps I might as well not use any object mapping and simply fetch data as arrays or plain objects. And if I structure my code properly and follow good separation of concerns then not having an ORM would not be such a bad thing since I would strive to make my application code ORM independent (and entity independent). Simply put, my modules would take certain data as input, do their thing and output whatever is needed so an ORM would be irrelevant.

Is my reasoning sound? I recently made a small web application based on Silex without any ORM - just plain SQL with DBAL. I created a CMS for two tables with forms and validation and didn’t really miss an ORM. Sure, ORM makes simple queries easier to write but this is not a deal breaker and I can treat it as syntactic sugar. The only problem to solve would be how to organize more complicated queries - but I think this can be easily done in classes that would be the equivalent of mapper classes in an ORM.

Generally, when I develop larger applications I tend to follow relational-centric design where I first develop database structure (which often gets large) and then base my code around that - in which case I don’t feel a need for object-relational mapping apart from perhaps cleaner access to data in simple cases. Any opinions welcome!

I’m interested in the reasoning behind this. Generally speaking you’ll want to use OOP in PHP so a customer object with an address property is a lot nicer to work with than having to query the customers table, then the addresses table to get both records.

You mean my reasoning behind doing without ORM generally or wanting to start developing with the database design?

Yes, you are right, such objects are nicer to work with. But they are not indispensable, especially that entity objects are supposed to be only plain data objects. Having to query another table instead of having the ORM do it for me is also a step more but then again nothing big. As a bonus I don’t have to worry about how the ORM will get this data and if it will do it efficiently - will it construct a join beforehand, or maybe issue separate queries, maybe no queries at all (fetching stale data from a cache?), etc.

I would also want an ORM with little overhead. I have tried Spot ORM - it’s supposed to be simple and efficient but fetching a single row by ID is roughly 65 times slower than with SQL via Doctrine DBAL layer. I’m not a micro-optimization junkie but this is way too much for me :smile:

I mean the fact you seem to discard ORMs out of hand, they all work in different ways, but I’d rather work with an ORM than pure SQL in general because it means writing less code… and and at what point is “efficiency” an issue? if it takes you 10% less time to write code then surely that’s better than it executing in 0.00001 instead of 0.00002 seconds?

Obviously this depends entirely on your choice of ORM, the efficiency is going to differ but it’s worth keeping in mind that a SELECT/JOIN query can sometimes be slower than two selects.

[shameless plug] take a look at Maphper: https://github.com/level-2/Maphper :slight_smile:

Well, I’m not really discarding them out of hand because in fact I would like to use one, except I can’t find any that would suit me well. I’m just weighing pros and cons, that’s why I’m considering going without an ORM.

In case of Spot ORM it’s more like between 0.00014 and 0.01500 seconds so this is not such a tiny difference. BTW, I more often reach a conclusion that less code is not always better - I value clarity more, in the sense of being obvious what the code does. An ORM reduces clarity because it does some magic to simplify things and we need to learn and understand it. 10% more code - probably yes - but at the same time clearer and more explicit about what it exactly does.

Besides, I like SQL :smile:

Correct, but this also depends on the database vendor - PostgreSQL handles joins better than MySQL so the situation may be reverse.

Actually, I’ve already looked at Maphper - and I like its interface. But the showstopper for me was that it has mechanisms that alters the structure of the database behind the scenes like adding indexes or doing some database optimizations (changing column types). For me an ORM has no right to alter my tables as it can have disastrous effect on a system - unless it’s on demand. I understand the reasoning to abstract the datastore completely and for simple projects it may work well. And AFAIK Maphper doesn’t work with PostgreSQL yet :). I need an ORM that will:

  • work with an existing database structure, preferably creating all entities from the database as a starting point (if I create a large db structure in a db modelling tool then why would I have to manually define the same stuff in my entities and add the same relationship definitions by hand?)
  • if I do ALTER TABLE then I want the ORM to easily adapt to the changes with least amount of work
  • preferably IDE friendly: code completion for both the mappers and entities
  • comfortable with plain SQL for more unusual queries
  • reasonably fast

Doctrine might somehow provide these things except it’s huge and has some ugly solutions sometimes.

1 Like

FYI, this feature is not required and disabled by default :slight_smile:

If your entity objects are none other than plain data objects, you are doing it very very wrong. It is the infamous anemic domain model anti-pattern, and you should avoid it. Use rich models, and put logic into your models, they aint meant to be pure data holders. Read the link below from Martin Fowler to learn more about this:

The question might be better posed like, when should I use the object persistence, which an ORM offers, and when shouldn’t I?

Have you heard about object relational impedance mismatch?

Scott

Good to know, it wasn’t clear to me from the documentation!

Well, but who is to say that the entity objects must constitute the whole model? I may have rich models in other specialized objects, which use the entity objects as source of data. I may treat the entity objects like arrays with data for any reason - for example I may like the syntax better. I still keep rich models except I have organized them into separate parts.

But anyway, you raised a valid point because I’ve been thinking about how much and what kind of behaviour I should put into the entities. Consider this:

class Product {
    private $id;
    private $name;
    private $net_price;
    private $tax;
    private $category;

    public function getFinalPrice() {
        return $this->net_price * $this->tax;
    }
}

I would think this is fine - a simple calculation based on other fields, it looks good. However, I can see a problem with this - what if the product table grows and has more than 50 columns, each resulting in its own field:

class Product {
    private $id;
    private $name;
    private $net_price;
    private $tax;
    private $category;
    private $field1;
    private $field2;
    // ...
    private $field50;

    public function getFinalPrice() {
        return $this->net_price * $this->tax;
    }
}

Our entity class may grow pretty large because we may have many more calculation methods like this. Moreover, how easy is it to test methods in such an object? Dependencies for getFinalPrice() are not clear from the outside because we don’t know what is needed to calculate the final price. In this case we use the tax field but it is not clear from the calling code - the method has access to all the other fields and it could be changed any time to calculate the price in a different way, for example based on the category or name or any other combination. getFinalPrice() needs the object to be populated with all fields to work correctly, which means I’ll have to do this when unit testing getFinalPrice()?

I may also want to have getFinalPriceForNewCustomers(), getFinalPriceForReturningCustomers(), getFinalPriceForBusinessPartners() and the entity class will grow to huge sizes.

I haven’t yet figured out a working solution to this problem but I suspect there must be a way to decouple entities from the rest of the model so that the actual model classes that perform business logic are independent from the storage type - and then I might choose an ORM, use dumb entity objects or simply retrieve data as arrays by SQL and the business model layer will not care about that.

As a starting point I have found an answer at SO: How should a model be structured in MVC?, which explains how to split the model layer into separate parts: Domain Objects, Data Mappers and Services. The crucial part is this:

Domain Objects are completely unaware of storage.

If I keep business methods like calculating the price in the entity objects then my business behaviour is mixed with storage (even though slightly abstracted behind the ORM layer).

I’ve read that article already but it doesn’t give a definite solution apart from switching to NoSQL maybe…

Sure, and my goal is not to solve the mismatch completely but rather to work with it in an elegant way :). I am aware that often the relational database design is the central starting point of an application and I’m not trying to translate everything to objects - hence my original consideration to go without an ORM.

I think if your class gets that huge, it is an indication that you are already violating SRP. Note that not all domain logic must go to domain models, to have a fat model doesnt mean to have a god class. SRP simply tells you that each class should have one responsibility, but it doesnt define what is a ‘responsibility’. For a smaller model all of its domain logic can be considered one single responsibility, but once the model grows it can be further decomposed into sub-responsibilities. At some point, you may need to move certain domain logic out of your domain models, especially if it involves interaction with other domain models, which suits well with service classes as mediator. The question is, when and how should you do this?

A rule of thumb is, always keep domain logic that strictly uses data only from your model’s properties/fields inside your domain model, these are what I call ‘pure domain logic’ since they dont depend on any external objects/environments. For domain logic that depends on the state of other object, such as interaction with other domain objects, you may strip it out and move such logic to service layer, if your domain model becomes too fat and almost like a god class.

For instance, Product::getFinalPrice() is a pure domain logic as it does not take external parameters. If you use a service class for this domain logic, you are violating good OO design principle, even breaking encapsulation as you have to expose $net_price and $tax to the service class. You should never move such domain logic to service layer, or you eventually end up with anemic domain model.

In your other example, Product::getFinalPriceForNewCustomers(), Product::getFinalPriceForReturningCustomers(), and Product:: getFinalPriceForBusinessPartners() are not pure domain logic, as it depends on the state of another object/variable, likely the customer/client type, which will be used to determine the appropriate method to dispatch. These methods actually involve interactions with another domain object, the customer domain object(well you have to get the customer type from somewhere, I think its from user/customer model). In this case, you can refactor these methods into service layer, given that your model already contains so many other domain logic.

However, be careful when you extract domain logic away from the models. It can be dangerous, as many people go way too far just stripping all domain logic from model, which is detrimental as well. Remember the S in SRP stands for ‘single’, your domain models should have one and only one responsibility. Having 2-3 responsibilities is wrong, which calls for extracting some of your methods to service classes. Having zero/no responsibility is also wrong, as you end up with anemic domain model.

I read the message as being, don’t expect an ORM to be the 100% solution for all of your persistence problems. When it doesn’t match, you have to fall back to other methods.

Scott

And the problem is I find it hard to define a responsibility of a product entity object - is it to store and provide data about a product? If yes then what data? Plain data as coming from the database? Or also data calculated on the fly? Only from the same entity (e.g. database row) or related rows too? If an accessor method contains any arguments does it mean it violates SRP?

OK, let’s consider this approach - how do I strip it out? For example, I have getFinalPriceForCustomer(), which depends on an external value. One solution I came up with:

class PriceCalculator {
    public function __construct($customer_type) {
        // ...
    }
    
    public function calculate($net_price) {
        // ...
    }
}

class Product {
    public function getFinalPriceForCustomer($customer_type) {
        $c = new PriceCalculator($customer_type);
        return $c->calculate($this->net_price);
    }
}

The problem is the Product entity still remains littered with the same methods, except the implementation has been moved outside. I could get rid of getFinalPriceForCustomer() entirely but then maintainability of the calling code becomes harder. Imagine I have many instances of this throughout my code:

// there's just one way of calculating the final price, no dependencies
$price = $product->getFinalPrice();

Later I decide to implement different prices for different customers so I can’t just add a customer type argument into the method calls but I need to change them into something like this:

$c = new PriceCalculator($customer->type);
$price = $c->calculate($product->net_price);

And one more example, very common among data mapper implementations - what about getting related entities? For example:

$product->getOpinions();

This is a very convenient way of grabbing data from a related table, but doesn’t it violate SRP in the same way as $product->getFinalPrice($customer_type)? If we eager-load all related data with the mapper we may say it doesn’t since the opinions can be thought of as a collection of data belonging to the product. But if we lazy-load then getOptions() needs to fetch data from a database (with a mapper) so we would need to have this:

$product->getOpinions($mapper);

and in this way we add persistence as another responsibility to the entity, leading to an Active Record pattern, which violates SRP.

As you can see I find it hard to specify what a single responsibility is for an entity object and there seems to be no clear cut rule where one responsibility ends and another begins. This needs a very disciplined programmer not to cross the lines easily.

SRP means that a class should have a single reason for changing. The reason for changing is determined by people, who have to deal with the objects the class creates. The end users. The business people.

So, if you have a product class, who deals with that object? Why would they want it to be changed? If there is only one reason and the request can only come from the same group of people at the same time, your class is following SRP.

An entity shouldn’t really have direct access to the database. If yo uare using an ORM like Doctrine, you’d have an entity manager for the database access. The entity stays “dumb”, as in, it has no idea there is even a database.

Scott

Going back to the original question, I have been using Doctrine 2’s ORM since it was in alpha. Quite powerful and solves a number of problems nicely. Over the last year or so I have been moving towards array based queries using Doctrine 2’s DBAL connection object.

  1. One of the initial attractions of using an ORM was the lazy loading aspect. Being able to kick off an initial query with a minimum of information and then automatically making queries is quite a nice concept. Unfortunately, there can be a significant performance hit. For many of my applications, using lazy loading slowed the page rendering by a noticeable amount. Investing a bit more time in building the initial query resulted in a better user experience.

  2. Doctrine 2’s ORM query building language(DQL) does eliminate the need to spell out individual database column names as well as join conditions. Which is a developer time saver. However, you often end up pulling in more information than you really need (by default the entire table will be loaded). DQL also has limited support for many SQL specific functions such as DATE_FORMAT. You can extend DQL but that’s a bit of a pain. Dropping down to plain SQL is often a bit of a relief. And Doctrine 2’s DBAL sql query builder makes developing complex queries less painful then building sql strings by hand.

  3. More and more of my server work involves returning json responses. So the idea of extracting database information as arrays, mapping the information into objects and then almost immediately serializing the objects back into json arrays just seems wrong somehow.

  4. Direct support for immutable value objects has always been a bit lacking. The latest versions of Doctrine 2’s ORM does have some support but it feels “bolted on”.

  5. And finally we get to the notion of anemic domain models. I have tried many times to avoid anemic models. And ran into the same sort of problems that are being discussed in this thread. It was just plain difficult for me to add real behavior especially within in the context of request/response applications which are often mostly crud based. Basically, I gave up on the whole notion of adding useful behavior to my data objects. I just use services. I would love to see some actual application production source code in which domain objects have real behavior.

To summarize, dropping back to using sql queries and arrays may feel like a step backwards but, when properly implemented, can end up being easier to maintain and faster than a more sophisticated approach.

Here is a recent slide show written by one of the main architectures of Doctrine 2, Benjamin Eberlei ,

This is just what I meant! But many ORM’s provide convenient access methods like $product->opinions[0]->author->name (to get the name of the author of the first product opinion) and this is not going to work in a Data Mapper ORM - unless all the related objects are loaded beforehand by the mapper or some cheating is going on behind the scenes.

Ahundiak, thanks for sharing your experience and the slide show - this is getting interesting! I think with good structuring of the model layer and putting stuff into services a well designed system can be made. I don’t think that anemic domain models is a problem if they are treated for what they really are - pure data holding objects - just like arrays. No one would complain that an array is anemic - it’s just what it is!

At the moment I need to find information about how to structure the model layer properly with code examples. Not much for now but there seem to be some good pointers here.

I’ve found that putting behaviour into entity objects has a high risk of ending up with large and hard to test objects. Hall_of_Famer has some good points on how to draw the line but still to me this doesn’t provide good enough separation of concerns.

I think Facebook agrees with you. :wink:

Scott

Well the responsibility of your domain models do not have to always stay the same, it can change with time as your application grows. Take an employee as an example, at first he is just an entry level worker and has the responsibilities assigned to him by his boss. As he starts to get promotion, he gains more and more responsibilities, and at some point he becomes a senior level worker, or manager, he delegates works to his subordinates, other entry level workers. In this way, he doesnt have to do all these things, many of his responsibilities have been reassigned to his subordinates. But he still does something, and he does what he does the best, or what other employees cannot do. So hes neither doing too much, nor being completely anemic.

Do you see the analogy here? Yeah, I know a major difference is that an employee or manager does not need to have single responsibility, he can have a lot more, but still has a limit beyond which he can no longer fulfill. For classes, we strive for single responsibility, one and only one. But what is this responsibility, Id say it depends on your business requirement. If you find it hard to define the responsibility for your product domain model, it likely means that you dont understand your business requirement yet. Its not a big problem, since you may want to go back to your domain model class, adding/removing methods on it from time to time. But whatever you do, always keep the model ‘fat’, but not ‘obese’ The price calculation logic may just stay in your model, if its the only business logic you have for your product. Otherwise, its a good candidate to move to service layer as your model starts to get ‘obese’.

In your example, Id say both your initial solution and later modification are okay in certain circumstances. Initially, you use Product:: getFinalPriceForCustomer() inside product model, although it does not fully strip out this responsibility from your model class, it does make your model much thinner, as you no longer have N methods for all customer types, only one generic method. Later when you strip it out completely into service class(PriceCalculator), it makes your model even thinner. In this way you gradually move domain logic from your domain model to your service layer, in real applications you want to do this for some domain logic, but not all.

And as other posters have pointed out, persistence is NOT a responsibility for domain model, and never should it be. You do raise a very good question about lazy loading in your product model, as it may just violate single responsibility principle. In this case, a possible solution is to provide a setOptions method, load your options from mapper, then assign it to your product model. It makes your code more tedious though. A better solution is to allow your mapper to set fetch mode to be either lazy or eager. In the case of lazy loading, you will never need to get and use the product options anyway(I cant say if it’s still lazy loading at all, as you wont load additional data at all). In the case of eager loading, you will be sure that the options are loaded for you already before you use it. It is then your controller’s responsibility to decide whether to lazy or eager load an object, by setting the flag in mapper.

I was not asking about the responsibility of the domain models but of the entity objects - they may not be the same. Initially, in case of a data mapper, an entity object only contains properties that correspond to database fields - so it is anaemic before we add any behaviour. What can be the responsibility of an object without any behaviour? Is there any? Maybe to hold data?

Let’s take the simple example again:

class Product {
    private $id;
    private $name;
    private $net_price;
    private $tax;
    private $category;
    private $field1;
    private $field2;
    // ...
    private $field50;

    public function getFinalPrice() {
        return $this->net_price * $this->tax;
    }
}

Now the responsibility is to hold product data and calculate the final price? We could consider this as a violation of SRP. Or maybe not if we generally state that the object’s responsibility is to provide data of a product.

Anyway, I’m also wondering if method getFinalPrice() is not problematic in its own way because it doesn’t accept any arguments and still calculates the price based on private properties, which are global within this large object. This smells somewhat like global variables but only limited to one object. The method’s dependencies are hidden and therefore it is not reusable - we can’t take it out and use it elsewhere without modifications - but we might want to, and this is not such a rare case in my experience. Testing is also a bit more difficult because we can’t simply pass arguments for calculation - we must inject the input values into the proper semi-global object properties first.

What I’m getting at is that adding any behaviour to entities smells bad to me because the methods are not reusable, harder to test, the responsibility of the classes is hard to specify and the classes may grow large. The solutions you gave here help keep the organization sane but only to a certain extent. I think I would prefer to leave entities anaemic and move all behaviour to the model’s service layer as independent components. Anyway, entities are part of the data access layer - if they are anaemic we can still build all the behaviour into the domain model (which we should do) so there is no risk of ending up with anaemic domain objects.

Well domain model = entity object, they are the same thing.

Actually holding product data is not considered a responsibility. Objects by default contain properties(data) and methods(behavior), if holding data is ever a responsibility, then no class can ever satisfy SRP unless it only contains methods(aka no data fields at all). Your mistake is that you consider holding data is a responsibility, it is not. Therefore, in your case only calculating the final price is the responsibility for your product class.

Well keeping data fields hidden within the holder object is not global state, what we mean by global is something that can be accessed and altered from anywhere, such as PHP’s request variables($_REQUEST, $_GET, $_POST, etc). In your case the properties are private and well-hidden from the outside client code, it is not global state at all. An object is supposed to depend and operate on its private data, it is encapsulation, a fundamental OO design principle. The way your domain model calculate prices is a good OO design, it’s even better than using getters/setters. Since you manipulate object data using the object’s API methods, and how the data field is used and changed is encapsulated from the client users.

When we talk about hidden dependency, we talk about external variables/objects that come from global scope rather than the object’s own scope. A private property is not a hidden dependency, unless you assign to it using global variable or singleton. Note the basic idea of object-oriented design, is to combine data and processes together in objects. If you only store data in your entity, but move behaviors to service, its anemic domain model anti-pattern and actually you aint doing OOP.

The reason why it seems global in your class is because your class is large, and it contains 50 properties. This means you make your class a god class, clearly not following SRP. For a proper domain layer design, no model/entity class should contain 50 data fields(I think even 15 is a bit too much). In fact, no class should have 50 properties in any context, you clearly violates SRP if you have such a huge class. Think about it, if you turn your entire program into a god class, it may have 50-100 properties(such as router, dispatcher, request, response, controller, model, view and all kinds of things in your framework), but then you are actually programming in old procedural way, its not an OO approach. Once your class grows, always refactor it into smaller components.

When I look at the forum software such as MyBB which has like 60-70 columns in its user table, I immediately realize its not following good OO design. You can easily factor things out into smaller entities and components. For instance, once your user model grows you may want to group out certain fields into sub-entities, such as user profile domain model(note I use entity and domain model interchangeably, since they are the same thing). In this way, your User model will contain another model called UserProfile, rather than 10-15 data fields. Its always possible to make your classes smaller and modular.

Nope, you are wrong again. Entities are not part of the data access layer, they are part of the domain/business layer. What you talk about is more like data transfer objects(DTOs), which only holds and transfers the data from data access objects(DAOs) such as data mapper, to domain models/entities. Also DTOs usually contain logic such as serializing data in different format, so even they are not truly anemic. DTOs are common in JAVA, but in PHP I doubt its necessary since you can use PDO’s mode FETCH_CLASS and FETCH_INTO to pass data to domain model directly. But still, even DTOs are not truly a part of data access layer, more like an intermediate between data access layer and business layer
http://martinfowler.com/eaaCatalog/dataTransferObject.html

Not at all. The domain model contains all the business logic, that’s data mappers, entities and all the logic that processes them.

I’m not so sure: For this purpose we can ignore private properties as part of the responsibility, so for public properties, $user->name is not really any different than $user->getName()