SitePoint Sponsor

User Tag List

Page 1 of 2 12 LastLast
Results 1 to 25 of 37
  1. #1
    SitePoint Evangelist hessodreamy's Avatar
    Join Date
    Apr 2005
    Location
    uk
    Posts
    524
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)

    Separation of database access between objects and their subordinates

    Here's something I often ponder over: If an object contains other objects, which is responsible for grabbing database information for the child objects?

    eg we have a Product class, which contains a $supplier of type Supplier class. If we instantiate the Product from an ID, would we also grab the details for its Supplier, or would we wait and see if the Supplier needs to be got? I'm talking in general here, as my system (and most system) are obviously much larger than indicated by this code snippet:

    First off, the Supplier does need to be able to get it's own information from the database:
    Code:
    class Supplier
    {
    	private $supplierID;
    	public $name;//probably define some setters & getters instead of using public
    	public $emailAddress;
    	
    	public __construct($supplierID = null)
    	{
    		if(is_numeric($supplierID))
    		{
    			$this->supplierID = $supplierID;
    			$this->load();
    		}
    	}
    	
    	private function load()
    	{
    		$database_result = something_database_function("SELECT s.supplierID, s.supplierName, s.emailAddress FROM suppliers s  WHERE s.supplierID=".$this->supplierID);
    		
    		$this->supplierName = $database_result->supplierName;
    		$this->emailAddress = $database_result->emailAddress;
    	}
    }
    When we instantiate the Product from an ID, we could also grab the Supplier:
    Code:
    class Product
    {
    	private $productID;
    	private $productName;
    	private $supplier; //instance of Supplier class
    	
    	public __construct($arg = null)
    	{
    		if(is_numeric($arg))
    		{
    			$this->productID = $arg;
    			$this->load();
    		}
    	}
    	
    	private function load()
    	{
    		$database_result = something_database_function("SELECT p.productID, p.supplierName, s.supplierID, s.supplierName, s.emailAddress FROM products p JOIN suppliers s ON p.supplierID = s.supplierID WHERE p.productID=".$this->productID);
    		
    		$this->productName = $database_result->productName;
    		
    		//create supplier
    		//not that we could also allow the Supplier construct to accept these values. Or failing that, make some setter methods in the supplier
    		$this->supplier = new Supplier();
    		$this->supplier->supplierName = $database_result->supplierName;
    		$this->supplier->emailAddress = $database_result->emailAddress;
    	}
    
    	public function getSupplier()
    	{
    		return $this->supplier;
    	}
    }
    Or we could just load the product information and only grab the supplier information when needed:
    Code:
    class Product
    {
    	private $productID;
    	private $name;
    	private $supplier;
    	private $_loaded_supplier = false;
    	
    	public __construct($arg = null)
    	{
    		if(is_numeric($arg))
    		{
    			$this->productID = $arg;
    			$this->load();
    		}
    	}
    	
    	private function load()
    	{
    		$database_result = something_database_function("SELECT p.productID, p.supplierName, p.supplierID FROM products p WHERE p.productID=".$this->productID);
    		
    		$this->productName = $database_result->productName;
    		$this->supplierID = $database_result->supplierID;
    	}
    	
    	public function getSupplier()
    	{
    		if($!this->_loaded_supplier){
    			$this->supplier = new Supplier($this->supplierID);
    			$this->_loaded_supplier = true;
    		}
    		return $this->supplier;
    	}
    }
    These classes are used in many parts of the system, so it's hard to know now how they will be used. There's also the situations where a list of (eg) products will be grabbed from the database with a single query, and Products being passed their instance data directly, rather than each one grabbing from the database. How would you handle a reference to a missing class property $product->some_property_we_havent_populated? Throw an error? return null? detect that this hasn't been given the data and grab it on the fly?

  2. #2
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,146
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by hessodreamy View Post
    Here's something I often ponder over: If an object contains other objects, which is responsible for grabbing database information for the child objects?
    Either is acceptable, or an entirely different approach can be used (see below).

    Quote Originally Posted by hessodreamy View Post
    How would you handle a reference to a missing class property $product->some_property_we_havent_populated? Throw an error? return null? detect that this hasn't been given the data and grab it on the fly?
    All of what you listed is acceptable.

    Now, I've always taken a different approach, is it right or wrong? Maybe. But I like it for a few reasons. First, my data access does not exist in my objects. I have separate classes for data access. Why? Because I like to try and make only 1 query to the database for a given action. Maybe this is my .NET background, maybe I'm just crazy (well at least a lot of us were crazy, as this is how we did it at my old job).

    The first thing I did was identify the ways in which I needed to retrieve data for my objects. Viewing a Product page, viewing Search results, the Shopping Cart page, etc. What was easily obvious about all of these retrieval processes and each one differed from the other (just slightly, but they were definitely different). So I'd write a query for each, store them in their own Data Access class (or if they were similar enough, store them in the same class as two different functions/methods).

    Then I would consider what indexes needed to be associated to the table based on these queries. To this end, I never had to worry about an object getting its data for a second time because the constructor went out and queried the data, or a property would go out and query the data. I knew exactly what data was being sent back for the given page/action/request because it had a dedicated data access for it.

    Again, I'm not going to say one way is wrong and another is right, I and many others took a different approach and it worked and scaled very well for us. We also required that all connections be closed as soon as possible, so we didn't permit passing connections from method to method or object to object, so you can see why the approach we took was favored over other ones. Connections are expensive, the queries once connected usually are not expensive (unless you have a poorly indexed table; or poorly written query).

  3. #3
    SitePoint Wizard silver trophybronze trophy Cups's Avatar
    Join Date
    Oct 2006
    Location
    France, deep rural.
    Posts
    6,869
    Mentioned
    17 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by hessodreamy
    $product->some_property_we_havent_populated
    If all the products are stored in the same table, and you add a new field, say "colour", then someone tries to access $product->colour then if it is "not populated" but does exist, you will get a null value, or whatever the default value is in your table.colour

    OR do you mean :

    If all the products are stored in the same table, and someone tries to access $product->shape which does not exist you have to decide how to handle that, silently ignore it and carry on (returning those fields which do exist)? Throw an err/exception etc.

    If your products can potentially be stored in multiple tables perhaps depending on product types, then the challenge is going to be slightly different.

    If your "end-user" is another developer then the exception/error might be a suitable response, if the end-user is a non-developer via a website who may be filtering on data depending on a GET string (which can be fiddled with, or be an old url link) then you might want to silently ignore and carry on displaying what you can.

    Quote Originally Posted by hessodreamy
    There's also the situations where a list of (eg) products will be grabbed from the database with a single query, and Products being passed their instance data directly
    This seems to suggest that load() be a bit smarter and respond to $ids being an array:
    eg.
    PHP Code:
    $arr = array(23,99,876);
    $ilelds = array('colour''shape');

    $kettles = new Product($arr$fields);
    $kettles->load(); 
    Or, create a loadMulti() method.

    Then you are then possibly heading into the realms of creating an SQL query string builder.

    Once you crack that, you can then refactor that code out into its own class which you can possibly use for both Supplier and Product and others.

    Then, when you switch to using PDO, or sqlite or any other rdbms you only alter one class.

    Making a generic SQL query string builder class which interacts with one single table is quite easy to do it probably doing the work of a Data Mapper.

    The problem arises when you want to do JOINS -- which you will be doing, one trick is to leave the door open so that you can create complex sql statements and just passThru($sql) to get the data you want.

    If you know you are going to be doing a lot of JOINS then either forget about a query builder, or face the fact that your Product class will contain lots of methods like this:

    PHP Code:
    function loadProductWithSupplier($ids$fields null){} 
    The best solution for your problem will, as ever, turn on your situation such as depending whether you are engineering a CRUD operation for managing data, or, whether you are simply creating simple and fast read-only data views.

  4. #4
    SitePoint Guru bronze trophy
    Join Date
    Dec 2003
    Location
    Poland
    Posts
    930
    Mentioned
    7 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by cpradio View Post
    Now, I've always taken a different approach, is it right or wrong? Maybe. But I like it for a few reasons. First, my data access does not exist in my objects. I have separate classes for data access. Why? Because I like to try and make only 1 query to the database for a given action. Maybe this is my .NET background, maybe I'm just crazy (well at least a lot of us were crazy, as this is how we did it at my old job).

    The first thing I did was identify the ways in which I needed to retrieve data for my objects. Viewing a Product page, viewing Search results, the Shopping Cart page, etc. What was easily obvious about all of these retrieval processes and each one differed from the other (just slightly, but they were definitely different). So I'd write a query for each, store them in their own Data Access class (or if they were similar enough, store them in the same class as two different functions/methods).

    Then I would consider what indexes needed to be associated to the table based on these queries. To this end, I never had to worry about an object getting its data for a second time because the constructor went out and queried the data, or a property would go out and query the data. I knew exactly what data was being sent back for the given page/action/request because it had a dedicated data access for it.

    Again, I'm not going to say one way is wrong and another is right, I and many others took a different approach and it worked and scaled very well for us. We also required that all connections be closed as soon as possible, so we didn't permit passing connections from method to method or object to object, so you can see why the approach we took was favored over other ones. Connections are expensive, the queries once connected usually are not expensive (unless you have a poorly indexed table; or poorly written query).
    cpradio, this sounds interesting, could you post a short example of the Data Access class you are talking about and how you would use it? Let's take an example of data similar to what the OP presented: we have a Product class and each product has a Supplier and also has Accessories. Now I understand you say that the Product constructor fetches the data using the Data Access class, right? Let's say we need to fetch Product data from the database in two different scenarios:

    1. Public web page, where we only need Product and its Accessories.
    2. Admin page where we need Product, Accessories and Supplier.

    (also in each case different fields need to be fetched from the db)

    How do you differentiate between these two cases so that you use sql queries optimized for the given task? Do you pass a parameter to the Product constructor? It would be great if you could illustrate that with sample code!

  5. #5
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,146
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Lemon Juice View Post
    Now I understand you say that the Product constructor fetches the data using the Data Access class, right? Let's say we need to fetch Product data from the database in two different scenarios:
    Sorry, that didn't come out exactly the way I wanted it to (or at least my original intention is easily mis-understood). I meant to say, my objects constructors would not try and access the data access, though I guess they could if they knew which scenario you needed...

    Quote Originally Posted by Lemon Juice View Post
    1. Public web page, where we only need Product and its Accessories.
    2. Admin page where we need Product, Accessories and Supplier.

    (also in each case different fields need to be fetched from the db)

    How do you differentiate between these two cases so that you use sql queries optimized for the given task? Do you pass a parameter to the Product constructor? It would be great if you could illustrate that with sample code!
    I can build something later today (hopefully), but here is the gist

    Typically I would have three layers, My Objects, Business Layer, and Data Access Layer.

    So given the above, the Data Access Layer would have either two classes or 1 class with two methods to retrieve the two scenarios described.
    Scenario 1, would select the product and its accessories as a data set (result whatever) back to the Business Layer which would perform any validation and convert that data set to the appropriate objects and return them to the callee.

    Scenario 2, would do the same calling a different method/class for both Business Layer and Data Access Layer resulting in the Product, Accessories and all Supplier information.

    Granted given how closely similar these two are (assuming Supplier is a limited amount of data, name, website, description), I would tend to have Scenario 1 use the same methods/classes as Scenario 2 and drop Scenario 1 altogether, yes it would return more data than it needed, but it is more maintainable that way.

    Now when the data is significantly different, Cart Products versus Production and Accessories, then the separation is necessary, as the Cart may only need the name, url, price, and stock of the product, and wouldn't need the description, accessories, etc that may make a larger result set returned (or cause paging on the table).

  6. #6
    SitePoint Guru bronze trophy
    Join Date
    Dec 2003
    Location
    Poland
    Posts
    930
    Mentioned
    7 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by cpradio View Post
    I can build something later today (hopefully), but here is the gist
    Ok, there's no hurry and thanks for the explanation . What I'm most interested in is where and how exactly you call the Data Access methods depending on various scenarios. For example, if you have a getSupplier() method in your Product object (or maybe you don't have and it's something different?) then how does the getter know which scenario (eventual sql code) needs to be executed. You also mentioned you don't have to worry about object getting its data for the second time - how is that achieved? I do it with a cache but your approach seems to not need one.

    Thanks.

  7. #7
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,146
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Lemon Juice View Post
    Ok, there's no hurry and thanks for the explanation . What I'm most interested in is where and how exactly you call the Data Access methods depending on various scenarios. For example, if you have a getSupplier() method in your Product object (or maybe you don't have and it's something different?) then how does the getter know which scenario (eventual sql code) needs to be executed. You also mentioned you don't have to worry about object getting its data for the second time - how is that achieved? I do it with a cache but your approach seems to not need one.

    Thanks.
    Yeah, I'll work on that. In the mean time another helping answer, my objects don't have methods that run any Data Access which is why I don't have to worry about it. Hopefully in a few hours I can post some code that resembles this technique (granted I primarily have only used it in .NET, but I'm sure I can work up a PHP example).

  8. #8
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,146
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    Okay, here is the general idea (there is a lot of work I didn't do, such as establishing a real connection or a lot of validation/refactoring for efficiency type stuff. Don't focus on the fact that the methods/functions are static (that was just to reduce my coding a slight bit).

    In essence, your Objects are more like DTOs (Data Transfer Objects). They may contain some useful utility functions for the item, but for the most part they are just a way to transfer that data from a controller to a view, or whatever path you may need to take.

    Objects:
    PHP Code:
    <?php
    namespace Objects;

    class 
    Product
    {
        private 
    $productId;
        public function 
    GetProductId()
        {
            return 
    $this->productId;
        }
        public function 
    SetProductId($value)
        {
            if (
    is_int($value))
            {
                
    $this->productId $value;
            }
        }

        private 
    $name;
        public function 
    GetName()
        {
            return 
    $this->name;
        }
        public function 
    SetName($value)
        {
            
    $this->name $value;
        }

        private 
    $supplier;
        public function 
    GetSupplier()
        {
            return 
    $this->supplier;
        }
        public function 
    SetSupplier($value)
        {
            
    $this->supplier $value;
        }

        private 
    $accessories;
        public function 
    GetAccessories()
        {
            return 
    $this->accessories;
        }
        public function 
    SetAccessories($value)
        {
            
    $this->accessories $value;
        }
        public function 
    AddAccessory($accessoryId$accessoryName)
        {
            if (!
    is_array($this->accessories))
            {
                
    $this->accessories = array();
            }

            
    $this->accessories[] = new Accessory($accessoryId$accessoryName);
        }
    }

    class 
    Accessory
    {
        public 
    $accessory_id;
        public 
    $name;

        public function 
    __construct($accessory_id$name)
        {
            
    $this->accessory_id $accessory_id;
            
    $this->name $name;
        }
    }

    class 
    Supplier
    {
        public 
    $supplier_id;
        public 
    $name;

        public function 
    __construct($supplier_id$name)
        {
            
    $this->supplier_id $supplier_id;
            
    $this->name $name;
        }
    }
    ?>
    Business Layer: (you will notice I am constantly overwriting the product id, name, supplier id and name, this is because I was lazy in my example)
    PHP Code:
    <?php
    namespace Business;

    class 
    Product
    {
        public static function 
    GetProductInfoAndAccessories($productId)
        {
            if (
    is_int($productId))
            {
                
    $results = \DataAccess\Product::GetProductInfoAndAccessories($productId);
                if (
    $results != null)
                {
                    
    // process results into object
                    
    $product = new \Objects\Product();
                    foreach (
    $results as $row)
                    {
                        
    $product->SetProductId($row['product_id']);
                        
    $product->SetName($row['product_name']);
                        
    $product->AddAccessory($row['accessory_id'], $row['accessory_name']);
                    }

                    return 
    $product;
                }

                return 
    null// or throw an exception
            
    }
            else
            {
                throw new 
    Exception("Must provide a valid product id");
            }
        }

        public static function 
    GetProductInfoAccessoriesAndSupplier($productId)
        {
            if (
    is_int($productId))
            {
                
    $results = \DataAccess\Product::GetProductInfoAccessoriesAndSupplier($productId);
                if (
    $results != null)
                {
                    
    // process results into object
                    
    $product = new \Objects\Product();
                    foreach (
    $results as $row)
                    {
                        
    $product->SetProductId($row['product_id']);
                        
    $product->SetName($row['product_name']);
                        
    $product->AddAccessory($row['accessory_id'], $row['accessory_name']);
                        
    $product->SetSupplier(new \Objects\Supplier($row['supplier_id'], $row['supplier_name']));
                    }

                    return 
    $product;
                }

                return 
    null// or throw an exception
            
    }
            else
            {
                throw new 
    Exception("Must provide a valid product id");
            }
        }
    }
    ?>
    Data Access: (the Base would typically have a Close function as well to destroy/unset the connection variable)
    PHP Code:
    <?php
    namespace DataAccess;

    class 
    Product extends Base
    {
        public static function 
    GetProductInfoAndAccessories($productId)
        {
            
    self::GetConnection()->prepare("SELECT p.product_id, p.name AS product_name, a.accessory_id, a.name AS accessory_name 
                      FROM Products AS p LEFT JOIN Accessories AS A ON p.product_id = a.product_id WHERE p.product_id = ?"
    );
            return 
    self::GetConnection()->execute(array($productId));
        }

        public static function 
    GetProductInfoAccessoriesAndSupplier($productId)
        {
            
    self::GetConnection()->prepare("SELECT p.product_id, p.name AS product_name, a.accessory_id, a.name AS accessory_name, s.supplier_id, s.name AS supplier_name 
                      FROM Products AS p LEFT JOIN Accessories AS A ON p.product_id = a.product_id LEFT JOIN Suppliers AS S on p.product_id = s.product_id WHERE p.product_id = ?"
    );
            return 
    self::GetConnection()->execute(array($productId));
        }
    }

    class 
    Base
    {
        private static 
    $connection;
        public static function 
    GetConnection()
        {
            if (
    self::$connection === null)
            {
                
    self::$connection = ... // define a new connection
            
    }

            return 
    self::$connection;
        }
    }
    ?>
    Page needing Product and Accessories (don't forget, Business\Product validates/filters the $_GET item
    PHP Code:
    $productInfo = \Business\Product::GetProductInfoAndAccessories($_GET['product_id']);
    // Do whatever you need to do with $productInfo; pass it to your view, whatever, it is your DTO 
    Page needing Product, Accessories, and Suppliers
    PHP Code:
    $productInfo = \Business\Product::GetProductInfoAccessoriesAndSuppliers($_GET['product_id']);
    // Do whatever you need to do with $productInfo; pass it to your view, whatever, it is your DTO 
    Again, not saying this is a right or wrong way, just a different approach that can/did work well for the applications we maintained at my old job.

  9. #9
    SitePoint Guru bronze trophy
    Join Date
    Dec 2003
    Location
    Poland
    Posts
    930
    Mentioned
    7 Post(s)
    Tagged
    0 Thread(s)
    cpradio, thanks for your explanation, that was a very detailed and clear description. I can see similarities in your method to what I often do - my object structure is based on Propel ORM (some older version, now they may have refactored their code) where each object (\Object) has a corresponding 'peer' object (\Business) with static methods similar to yours. So the Product object is like your \Objects\Product and the ProductPeer object is like \Business\Product and there I can put methods like GetProductInfoAccessoriesAndSuppliers(). The major difference is I don't use a separate Data Access class, I perform all of that in the peer (\Business) methods. This is convenient when writing the code because all fetching code is in one place but there is less clean separation of tasks - I don't feel I'm missing this separation, however I haven't tried it yet so I can't say if that's wouldn't be beneficial. Certainly having a separate Data Access class would allow me to use the same methods for fetching data in different scenarios, for example when I don't need to feed the data into objects and plain array would suffice.

    The thing is that I can actually do the same thing as you in that I do not have to worry about fetching data for the second time except I didn't always take advantage of this possibility probably because I wanted to do it too properly, in other words fetching as little as possible and then fetching more chunks of data on demand. For example, I first fetch the Product object that would contain data from the product table only and I don't think at this stage what other related data I might need later. So whenever a method like GetSupplier() is called that method has to worry about fetching the data from the database. My reasoning was that if an object has GetSupplier() method then it needs to always work, no matter how the object was instantiated. In your case GetSupplier() will not work if you don't call the appropriate method in your Business class so you need to think beforehand what data you will need and call the right business method. In an ideal world I wouldn't need to think about it beforehand but I can see in practice it's a good idea to do so at least from performance point of view - fitting relational database access into objects is not possible if you want both the [full] convenience of the objects and optimised database access.

    Thanks, that was inspiring and I may now lean more towards the way you fetch objects since it makes sense and I can already do it in my current setup, it's just a question of being aware of it and putting it to use.

    BTW, are your objects writable? Can you fetch a Product object, change something in it and then call $product->save() to save the changes in the database? I'm asking because when you have specialised methods for fetching in your Business Layer and some of them don't fetch all of the data then save() might not work properly because some fields in the object are empty and may be unintentionally deleted on saving. Sure, there are ways to deal with it but it makes the whole system more complicated.

  10. #10
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,146
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Lemon Juice View Post
    The thing is that I can actually do the same thing as you in that I do not have to worry about fetching data for the second time except I didn't always take advantage of this possibility probably because I wanted to do it too properly, in other words fetching as little as possible and then fetching more chunks of data on demand.
    And that is a valid point, I just dislike how chatty that scenario is, constantly trying to get data that could have been established once earlier. Think about it, you are not trying to get the data across pages, you are still trying to load the same freakin' page but just realized you need data you didn't have, so you go back out to the database to get it. In my opinion, this style lacks requirements/clarity as to what the application and each page in the application is supposed to do. It isn't "wrong", it just doesn't seem efficient and could lead to a lot of chattiness. Just my very strong opinion

    Quote Originally Posted by Lemon Juice View Post
    Thanks, that was inspiring and I may now lean more towards the way you fetch objects since it makes sense and I can already do it in my current setup, it's just a question of being aware of it and putting it to use.

    BTW, are your objects writable? Can you fetch a Product object, change something in it and then call $product->save() to save the changes in the database?
    Yes. You would could update the product properties and then pass it to a \Business\Products::Save($product) method (actually you can have a $product->save() that then calls \Business\Products::Save($product) too), the idea is your data goes to the Business layer first, the business layer validates all properties, etc.

    You can also take it a step further and develop a "isDirty" indicator for each property (or a framework really that the \Objects\Product class could extend) that would track what properties were changed. Then you know what properties need to be updated/inserted/added/deleted, etc. (I haven't developed this framework yet; though now it seems like a really neat challenge, so maybe in my spare time I might try it).

  11. #11
    SitePoint Guru bronze trophy TomB's Avatar
    Join Date
    Oct 2005
    Location
    Milton Keynes, UK
    Posts
    989
    Mentioned
    9 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by Cups View Post

    The problem arises when you want to do JOINS -- which you will be doing, one trick is to leave the door open so that you can create complex sql statements and just passThru($sql) to get the data you want.

    If you know you are going to be doing a lot of JOINS then either forget about a query builder, or face the fact that your Product class will contain lots of methods like this:

    PHP Code:
    function loadProductWithSupplier($ids$fields null){} 
    The best solution for your problem will, as ever, turn on your situation such as depending whether you are engineering a CRUD operation for managing data, or, whether you are simply creating simple and fast read-only data views.

    This is the biggest problem. The original question of Should a product fetch its own data? Probably not is the answer but that's a case of ActiveRecord vs DataMapper which has been discussed many times before. Separation of concerns implies DataMapper is the better pattern (and I agree).

    The problem as you say, comes with joins but your solution presents even more problems: The SoC benefits from DataMapper are lost as soon as you start giving products/supplies access to the database as it tightly couples them to it. Once a product is coupled to the database, any flexibility it once had is lost.

    The answer is to handle joins outside the data objects. On the face of it, the solution is:

    PHP Code:
    $product $productMapper->findById(123);
    $supplier $supplierMapper->findById($product->supplierId); 
    However, this creates very brittle code due to lack of encapsulation and the need of repetition. Anywhere this code is reused it needs to be repeated. What if "supplierId" becomes supplier_id. Messy! This code is clearly an OO layer over the top of relational data.

    In an ideal world you should be able to call $product->getSupplier() from anywhere and it work automatically, but at the same time keeping it so that $product does not have a direct dependency on the database.

    I won't go over old ground as there was a very good and in-depth discussion on how this can be achieved here: http://www.sitepoint.com/forums/show...Mapper-Library

  12. #12
    SitePoint Guru bronze trophy
    Join Date
    Dec 2003
    Location
    Poland
    Posts
    930
    Mentioned
    7 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by TomB View Post
    In an ideal world you should be able to call $product->getSupplier() from anywhere and it work automatically, but at the same time keeping it so that $product does not have a direct dependency on the database.

    I won't go over old ground as there was a very good and in-depth discussion on how this can be achieved here: http://www.sitepoint.com/forums/show...Mapper-Library
    In this other thread you present this example:
    PHP Code:
    echo $user->orders[0]->items[0]->product->manufacturer->name
    It's very convenient OOP but how many sql statements are executed under the hood? Also, how many orders are retrieved from the db and how many items of the first order are retrieved? Here we just need the first order and the first item of the first order.

  13. #13
    SitePoint Guru bronze trophy TomB's Avatar
    Join Date
    Oct 2005
    Location
    Milton Keynes, UK
    Posts
    989
    Mentioned
    9 Post(s)
    Tagged
    2 Thread(s)
    Because it's using offsetGet the implementation can do anything. In this case (assuming that it's using a database) it would be able to run a query that used LIMIT 1 OFFSET $offset.

    One SQL statement is used per mapper. However, this is often faster than joins.

    The main point here, though, is that they don't have to be rows in the DB, they could be coming from a web service. The users could be in a database and the products could be in an XML file or totally different database/server for instance. This offers far greater flexibility than tying all your data to a single source.

  14. #14
    Non-Member
    Join Date
    Oct 2007
    Posts
    363
    Mentioned
    11 Post(s)
    Tagged
    0 Thread(s)
    Aren't you trying to solve a problem that has already been solved by the Doctrine 2 ORM? I would highly recommend reading the following articles:

    http://ihaveabackup.net/2012/02/27/w...trine2-anyway/
    http://symfony.com/doc/current/book/doctrine.html

  15. #15
    SitePoint Wizard bronze trophy
    Join Date
    Jul 2006
    Location
    Augusta, Georgia, United States
    Posts
    4,151
    Mentioned
    16 Post(s)
    Tagged
    3 Thread(s)
    The thing you really need to ask yourself is whether you are looking for a generic framework or just a series of patterns to follow. If you are looking to build a generic framework for handling this layer than you are probably better off using something like Doctrine. However, if you would like the code to be as straight-forward and efficient but as possible than a solution like cpradio presented is a good option. Really that solution is nothing more than a data mapper/orm with an extra layer for mapping the the result set to entities. Nothing is automated though like it would be in doctrine, which can be a huge advantage or disadvantage depending on different circumstances.

    Quote Originally Posted by aaarrrggh
    Aren't you trying to solve a problem that has already been solved by the Doctrine 2 ORM? I would highly recommend reading the following articles:
    The problem with doctrine is it is very complex and a beast to understand and maintain. Powerful of course but it is huge. Not creating a generic automated mapper will always be more code efficient and readable though it requires writing everything from scratch or just about. I have nothing against doctrine myself just find it large and unnecessary at times.
    The only code I hate more than my own is everyone else's.

  16. #16
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,146
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by aaarrrggh View Post
    Aren't you trying to solve a problem that has already been solved by the Doctrine 2 ORM? I would highly recommend reading the following articles:

    http://ihaveabackup.net/2012/02/27/w...trine2-anyway/
    http://symfony.com/doc/current/book/doctrine.html
    Hey @aaarrrggh ;, have you had any personal experience with Doctrine 2 ORM? I'd like to hear about your personal experience with it (if any). This thread has a lot of potential to be a good resource for many different users, a personal story can easily help sell an ORM.

    My personal experience with ORM's and Frameworks is limited. I have never had the need to use them in PHP, and in the .NET world our company had a strict rule of not using third-party products unless your team had time to vet them (so we would end up creating our own without any bells and whistles, like I did in my prior example -- first time I wrote that in PHP too).

    When it comes down to it, I still like to roll my own versus using a third party solution for two reasons, 1) I get to learn a lot by having to re-invent the wheel, and 2) Most of the third party solutions I've researched using have a lot more to them than what I would need, so why bundle it with an already growing codebase? I still believe ORMs and third party Frameworks are good, for some, but not necessarily for everyone.

    @hessodreamy ; Lots of food for thought here, if there is any one aspect that sounds interesting to you, please ask more questions on how it may be suitable to your project (or not suitable). Out of all of the replies here, there is a LOT of good information, different tactics, etc.

  17. #17
    Non-Member
    Join Date
    Oct 2007
    Posts
    363
    Mentioned
    11 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by oddz View Post
    The thing you really need to ask yourself is whether you are looking for a generic framework or just a series of patterns to follow. If you are looking to build a generic framework for handling this layer than you are probably better off using something like Doctrine. However, if you would like the code to be as straight-forward and efficient but as possible than a solution like cpradio presented is a good option. Really that solution is nothing more than a data mapper/orm with an extra layer for mapping the the result set to entities. Nothing is automated though like it would be in doctrine, which can be a huge advantage or disadvantage depending on different circumstances.



    The problem with doctrine is it is very complex and a beast to understand and maintain. Powerful of course but it is huge. Not creating a generic automated mapper will always be more code efficient and readable though it requires writing everything from scratch or just about. I have nothing against doctrine myself just find it large and unnecessary at times.

    Have you checked out Doctrine 2? It's very different to Doctrine 1. You actually use plain old php objects to map out your domain logic and these classes end up looking very similar to the classes we see in this thread. About the only significant difference is that you don't have any sql in them (generally, although you can do this if you want to for some reason - although this kinda defeats the purpose of a DBAL). You use meta information in the class to map out the persistance layer, and from there it is dealt with for you. You can end up doing stuff like this:

    PHP Code:
    $comments $blogPost->getComments();

    foreach (
    $comments as $comment){
     
    $comment->getText();
     
    $comment->getUsername();

    Based on the relations you've set up in your own objects.

    On top of this you don't have complex issues to worry about, such as worrying about what will happen if say a comment is updated while you're working on a blog post with related comments (all tracked automatically for you). It also does lazy loading, so it won't load the comments in the above instance automatically unless you explicitly need to call the getComments() method.

    It also has some exceptionally good stuff for database migrations, and is the first ORM I've ever used that works so well with any data source that I've actually used this facility in pracitice. In my current application, all of the unit testing is done in a sqlite database, while the production and staging server and my development environment all use mysql - I have had to change literally nothing in my objects at all to make this work (it's literally a one line thing in a config file that basically says "if in the testing environment, load through sqlite instead).

    Overall I've found it excellent, not particularly hard to get your head around and it produces massive benefits if you want to map your business domain logic to a persistance layer of any kind.

    Here's another article about the principle behind domain driven design: http://net.tutsplus.com/tutorials/domain-driven-design/

    I would highly recommend learning this, because it will prevent you from re-inventing the wheel time and time again. Don't mistake Doctrine 2 for Doctrine 1 or for Active Record based ORMS. This is on a whole other level, and I believe it solves the original poster's problems.

  18. #18
    Non-Member
    Join Date
    Oct 2007
    Posts
    363
    Mentioned
    11 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by cpradio View Post
    Hey @aaarrrggh ;, have you had any personal experience with Doctrine 2 ORM? I'd like to hear about your personal experience with it (if any). This thread has a lot of potential to be a good resource for many different users, a personal story can easily help sell an ORM.

    My personal experience with ORM's and Frameworks is limited. I have never had the need to use them in PHP, and in the .NET world our company had a strict rule of not using third-party products unless your team had time to vet them (so we would end up creating our own without any bells and whistles, like I did in my prior example -- first time I wrote that in PHP too).

    When it comes down to it, I still like to roll my own versus using a third party solution for two reasons, 1) I get to learn a lot by having to re-invent the wheel, and 2) Most of the third party solutions I've researched using have a lot more to them than what I would need, so why bundle it with an already growing codebase? I still believe ORMs and third party Frameworks are good, for some, but not necessarily for everyone.

    @hessodreamy ; Lots of food for thought here, if there is any one aspect that sounds interesting to you, please ask more questions on how it may be suitable to your project (or not suitable). Out of all of the replies here, there is a LOT of good information, different tactics, etc.
    Hey CP - yeah, I was using Doctrine 2 just today actually.

    As I said in the above post - it's really great, because you end up simply writing plain old php objects. The only thing that makes them different to standard php objects in anyway is that you attach certain meta tags to fields that helps the Doctrine 2 orm do it's thing.

    I may write something up here later to explain this more fully. Perhaps I could give you some examples from the stuff I've been doing recently. I can tell you I've been using Doctrine 2 inside the Zend Framework 1 based on this tutorial video that I found: http://www.zendcasts.com/unit-testin...ities/2011/02/ (Even though the video is Entitled "Unit Testing Doctrine 2 Entities", it goes beyond that - shows you how to set it up in ZF1, and also shows you how to set it up so you end up using a sqlite db for unit testing while maintaining a mysql database for production/development and staging).

    I'm currently doing some work (as always), but in a few weeks I'll have 2 weeks off due to paternity, so perhaps I could write up an example of something here for people to look at?

    It's really a great tool. It saves you so many headaches when it comes to setting up the legwork for your domain level logic.

  19. #19
    SitePoint Wizard TheRedDevil's Avatar
    Join Date
    Sep 2004
    Location
    Norway
    Posts
    1,196
    Mentioned
    4 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by cpradio View Post
    Again, I'm not going to say one way is wrong and another is right, I and many others took a different approach and it worked and scaled very well for us. We also required that all connections be closed as soon as possible, so we didn't permit passing connections from method to method or object to object, so you can see why the approach we took was favored over other ones. Connections are expensive, the queries once connected usually are not expensive (unless you have a poorly indexed table; or poorly written query).
    That is a strange approach when you use SQL Server, as you say each connection is very expensive. But perhaps it worked well for the type of application you guys worked on.

    On a side note, connections in MySQL is very cheap (compared to SQL Server) so this is not a worry here.

    Quote Originally Posted by aaarrrggh View Post
    It also has some exceptionally good stuff for database migrations, and is the first ORM I've ever used that works so well with any data source that I've actually used this facility in pracitice. In my current application, all of the unit testing is done in a sqlite database, while the production and staging server and my development environment all use mysql - I have had to change literally nothing in my objects at all to make this work (it's literally a one line thing in a config file that basically says "if in the testing environment, load through sqlite instead).
    The problem with an ORM is that you sacrifice database/server speed/efficiency for the ability to write faster code. Since PHP also compile this data on each run, it also make an ORM less effective in this language compared to Java and C#.

    Personally we prefer a plain DOA approach, where we have a class for each table in the database, and then separate methods depending on what data that is required and if it pull or insert, as this allow us to optimize the queries to the fullest. It also makes it very easy to pull a cache solution down on the queries as well, or change to a different database type.

  20. #20
    Non-Member
    Join Date
    Oct 2007
    Posts
    363
    Mentioned
    11 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by TheRedDevil View Post
    That is a strange approach when you use SQL Server, as you say each connection is very expensive. But perhaps it worked well for the type of application you guys worked on.

    On a side note, connections in MySQL is very cheap (compared to SQL Server) so this is not a worry here.


    The problem with an ORM is that you sacrifice database/server speed/efficiency for the ability to write faster code. Since PHP also compile this data on each run, it also make an ORM less effective in this language compared to Java and C#.

    Personally we prefer a plain DOA approach, where we have a class for each table in the database, and then separate methods depending on what data that is required and if it pull or insert, as this allow us to optimize the queries to the fullest. It also makes it very easy to pull a cache solution down on the queries as well, or change to a different database type.
    I think the speed argument with regards to the ORM issue is overstated in many cases. Also, Doctrine 2 has methods for dealing with caching built in, so it can actually be far more efficient than writing standard mysql code.

    I haven't really experienced too many issues with regard to speed in practice. In general, when people try to write their own implementations for this sort of thing they tend to end up hitting brick walls and writing all kinds of messy hacks to deal with issues further down the line.

    I think I may write up a mini tutorial at some point if I can find the time...

  21. #21
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,146
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by TheRedDevil View Post
    That is a strange approach when you use SQL Server, as you say each connection is very expensive. But perhaps it worked well for the type of application you guys worked on.
    Can you elaborate on why it is strange to pull all of the data you need upfront and not have to go back time and time again for more data? So you ultimately, make 1 connection, do your query (or queries), disconnect and never connect again until the next page request/submission. I just curious as to why you consider it strange (call it my curiosity).

  22. #22
    SitePoint Guru bronze trophy TomB's Avatar
    Join Date
    Oct 2005
    Location
    Milton Keynes, UK
    Posts
    989
    Mentioned
    9 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by aaarrrggh View Post
    I think the speed argument with regards to the ORM issue is overstated in many cases. Also, Doctrine 2 has methods for dealing with caching built in, so it can actually be far more efficient than writing standard mysql code.

    I haven't really experienced too many issues with regard to speed in practice. In general, when people try to write their own implementations for this sort of thing they tend to end up hitting brick walls and writing all kinds of messy hacks to deal with issues further down the line.

    I think I may write up a mini tutorial at some point if I can find the time...
    Well yes, however, writing an ORM as a learning exercise is not a bad idea at all. When you hit those brick walls it gives you an understanding of why things are done how they are.

    When I looked into Doctrine (admittedly, 2 years ago) the learning curve was huge and it wasn't obvious why they'd made certain design decisions. It also took a hell of a lot of configuration to do something simple. I want something that simply works without needing to configure everything up front.

    For instance, in my own mapper I can run echo $datamapper->user->findById(123)->name; without defining a single class and only configuring the data mapper to default to the database and the database connection.


    edit: I just looked at the manual for doctrine 2.... metadata in comments?! From a design perspective that is insanity. The idea of code which can break if comments are changed, frankly, is shocking. There's no way to syntax check it, other programmers looking at the code have a whole new syntax to learn at least XML is vaguely readable by anyone and not only that, storing metadata with data is always bad practice as it tightly couples them preventing them from being interchangeable as well as highly reducing portability. It genuinely worries me that they're teaching people that this is a good practice and encouraging others to do it.

  23. #23
    SitePoint Wizard TheRedDevil's Avatar
    Join Date
    Sep 2004
    Location
    Norway
    Posts
    1,196
    Mentioned
    4 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by cpradio View Post
    Can you elaborate on why it is strange to pull all of the data you need upfront and not have to go back time and time again for more data? So you ultimately, make 1 connection, do your query (or queries), disconnect and never connect again until the next page request/submission. I just curious as to why you consider it strange (call it my curiosity).
    I think I have possibly misunderstood your previous sentence, what I assumed from it was that the database connection was opened and then closed per separated query. I.e. that it would set up the connection to the database several times per load, each time you run a separate query. If that is not the case, then it is just a misunderstanding from my side, and please ignore my comment.

    Quote Originally Posted by aaarrrggh View Post
    I think the speed argument with regards to the ORM issue is overstated in many cases. Also, Doctrine 2 has methods for dealing with caching built in, so it can actually be far more efficient than writing standard mysql code.

    I haven't really experienced too many issues with regard to speed in practice. In general, when people try to write their own implementations for this sort of thing they tend to end up hitting brick walls and writing all kinds of messy hacks to deal with issues further down the line.
    Claiming that using an ORM will get you more effective queries than writing them yourself just show ignorance. The only case that will be true is if the programmer does not know SQL.

    Have you tried using it on a table with a larger record set?
    Have you tried using it in an enviroment requiring database sharding?
    Have you tried using it in a master/slave database server setup?

    A ORM can have its place, and I have used it many times when creating windows applications, but I still believe using it in a PHP application is a bad idea. It can work on smaller websites, but the second you need to expand and scale, you have problems, not to mention you would need to scale earlier as well.

  24. #24
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,146
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by TheRedDevil View Post
    I think I have possibly misunderstood your previous sentence, what I assumed from it was that the database connection was opened and then closed per separated query. I.e. that it would set up the connection to the database several times per load, each time you run a separate query. If that is not the case, then it is just a misunderstanding from my side, and please ignore my comment
    Thanks. In our prior company, we were required to re-connect (but we would run multiple queries within a single connection to try and only use 1 connection for the entire page), as leaving a SQL connection open ended up causing us to max out connections (we had fairly high traffic at all hours). In PHP, I'd either pass in the connection or make it a base variable so it can be persisted across queries.

  25. #25
    SitePoint Guru bronze trophy
    Join Date
    Dec 2003
    Location
    Poland
    Posts
    930
    Mentioned
    7 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by TomB View Post
    Because it's using offsetGet the implementation can do anything. In this case (assuming that it's using a database) it would be able to run a query that used LIMIT 1 OFFSET $offset.
    Considering there are many more cases apart from pulling the first item from the data source you need to have a very intelligent translator of object access properties and methods to sql so that it remains efficient. Still, even the most intelligent translator cannot optimize everything if you don't tell it up front what data you will need to request later because often it's much faster to get more data in bigger chunks than to request many smaller ones. Therefore from performance point of view what cpradio suggests makes more sense - fetch all data beforehand in one go.

    Quote Originally Posted by TomB View Post
    The main point here, though, is that they don't have to be rows in the DB, they could be coming from a web service. The users could be in a database and the products could be in an XML file or totally different database/server for instance. This offers far greater flexibility than tying all your data to a single source.
    It looks like you have built a very flexible system with almost ideal OO data access. However, I don't think it doesn't sacrifice some portion of performance which can be important for any large system. For anything small it's sweet to have so much data source abstraction and independence but when a site gets large and datasets expand then it becomes important to tweak or rewrite individual sql queries and at that stage this abstraction becomes a hindrance. And I don't think being able to substitute a relational database to XML files or whatever else you may think of is important (unless you have a specific requirement for this in a project). In a small system you can play with it but with a large database this would be almost insane.

    This is an ongoing question of how far we are willing to go with implementing good and flexible OOP at the expense of practical usefulness (performance). I don't think in PHP I would go as far as you but certainly that would be a good exercise in learning OOP techniques. Everyone chooses their own balance.

    Quote Originally Posted by TomB View Post
    Well yes, however, writing an ORM as a learning exercise is not a bad idea at all. When you hit those brick walls it gives you an understanding of why things are done how they are.
    Yes, I agree, it's worth the effort. I did this some 3 years ago and it wasn't just a learning exercise because now I use my ORM in almost all of my projects and it seems to work very well and fast. I still make small improvements to it from time to time but generally it's done. This way I am able to work with an ORM that suits my requirements:

    - simple - as few magical things happening under the hood as possible, support only for the simplest relations, I don't want an ORM to do everything for me because then I spend more time maintaining my ORM (or learning a third-party ORM) than actually using it.
    - fast even by sacrificing database independence - when I choose a database for a project I don't intend to change it ever.
    - provide convenience for basic CRUD operations and provide easy ways to execute my own SQL - I like to write SQL and I like having total control of optimizing queries, I just need my ORM to put the result set into objects.
    - no need to define metadata, no XML or other model/relationship configuration files, no other maintenance chores (in my case it's just a question of running one script that will automatically reverse-engineer all my database structure into proper objects)
    - the major goal of ORM for me is to provide me with structure of objects which I can use to manipulate data and add my convenience methods, etc.

    I have achieved all these goal knowing that I have sacrificed a few things like database independence and a few good OOP practices but the system works, it's fairly easy to use and my sites load very fast. I understand this is a not full-features ORM but I don't mind.

    Everyone will have different priorities but if you have time to play with writing your own ORM then by all means go for it. However, I think that novices in OOP should first get some experience with an existing system just to gain some basic knowledge.

    Quote Originally Posted by TomB View Post
    When I looked into Doctrine (admittedly, 2 years ago) the learning curve was huge and it wasn't obvious why they'd made certain design decisions. It also took a hell of a lot of configuration to do something simple. I want something that simply works without needing to configure everything up front.

    For instance, in my own mapper I can run echo $datamapper->user->findById(123)->name; without defining a single class and only configuring the data mapper to default to the database and the database connection.


    edit: I just looked at the manual for doctrine 2.... metadata in comments?! From a design perspective that is insanity. The idea of code which can break if comments are changed, frankly, is shocking. There's no way to syntax check it, other programmers looking at the code have a whole new syntax to learn at least XML is vaguely readable by anyone and not only that, storing metadata with data is always bad practice as it tightly couples them preventing them from being interchangeable as well as highly reducing portability. It genuinely worries me that they're teaching people that this is a good practice and encouraging others to do it.
    yes, I also don't like systems where I have to define all kinds of configurations in xml, comments, etc. There is a point where learning to use a system and configuring it takes so much time that the development time savings by actually using the system are not enough to be profitable.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •