SitePoint Sponsor

User Tag List

Page 1 of 2 12 LastLast
Results 1 to 25 of 37

Hybrid View

  1. #1
    SitePoint Evangelist hessodreamy's Avatar
    Join Date
    Apr 2005
    Location
    uk
    Posts
    524
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)

    Separation of database access between objects and their subordinates

    Here's something I often ponder over: If an object contains other objects, which is responsible for grabbing database information for the child objects?

    eg we have a Product class, which contains a $supplier of type Supplier class. If we instantiate the Product from an ID, would we also grab the details for its Supplier, or would we wait and see if the Supplier needs to be got? I'm talking in general here, as my system (and most system) are obviously much larger than indicated by this code snippet:

    First off, the Supplier does need to be able to get it's own information from the database:
    Code:
    class Supplier
    {
    	private $supplierID;
    	public $name;//probably define some setters & getters instead of using public
    	public $emailAddress;
    	
    	public __construct($supplierID = null)
    	{
    		if(is_numeric($supplierID))
    		{
    			$this->supplierID = $supplierID;
    			$this->load();
    		}
    	}
    	
    	private function load()
    	{
    		$database_result = something_database_function("SELECT s.supplierID, s.supplierName, s.emailAddress FROM suppliers s  WHERE s.supplierID=".$this->supplierID);
    		
    		$this->supplierName = $database_result->supplierName;
    		$this->emailAddress = $database_result->emailAddress;
    	}
    }
    When we instantiate the Product from an ID, we could also grab the Supplier:
    Code:
    class Product
    {
    	private $productID;
    	private $productName;
    	private $supplier; //instance of Supplier class
    	
    	public __construct($arg = null)
    	{
    		if(is_numeric($arg))
    		{
    			$this->productID = $arg;
    			$this->load();
    		}
    	}
    	
    	private function load()
    	{
    		$database_result = something_database_function("SELECT p.productID, p.supplierName, s.supplierID, s.supplierName, s.emailAddress FROM products p JOIN suppliers s ON p.supplierID = s.supplierID WHERE p.productID=".$this->productID);
    		
    		$this->productName = $database_result->productName;
    		
    		//create supplier
    		//not that we could also allow the Supplier construct to accept these values. Or failing that, make some setter methods in the supplier
    		$this->supplier = new Supplier();
    		$this->supplier->supplierName = $database_result->supplierName;
    		$this->supplier->emailAddress = $database_result->emailAddress;
    	}
    
    	public function getSupplier()
    	{
    		return $this->supplier;
    	}
    }
    Or we could just load the product information and only grab the supplier information when needed:
    Code:
    class Product
    {
    	private $productID;
    	private $name;
    	private $supplier;
    	private $_loaded_supplier = false;
    	
    	public __construct($arg = null)
    	{
    		if(is_numeric($arg))
    		{
    			$this->productID = $arg;
    			$this->load();
    		}
    	}
    	
    	private function load()
    	{
    		$database_result = something_database_function("SELECT p.productID, p.supplierName, p.supplierID FROM products p WHERE p.productID=".$this->productID);
    		
    		$this->productName = $database_result->productName;
    		$this->supplierID = $database_result->supplierID;
    	}
    	
    	public function getSupplier()
    	{
    		if($!this->_loaded_supplier){
    			$this->supplier = new Supplier($this->supplierID);
    			$this->_loaded_supplier = true;
    		}
    		return $this->supplier;
    	}
    }
    These classes are used in many parts of the system, so it's hard to know now how they will be used. There's also the situations where a list of (eg) products will be grabbed from the database with a single query, and Products being passed their instance data directly, rather than each one grabbing from the database. How would you handle a reference to a missing class property $product->some_property_we_havent_populated? Throw an error? return null? detect that this hasn't been given the data and grab it on the fly?

  2. #2
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,122
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by hessodreamy View Post
    Here's something I often ponder over: If an object contains other objects, which is responsible for grabbing database information for the child objects?
    Either is acceptable, or an entirely different approach can be used (see below).

    Quote Originally Posted by hessodreamy View Post
    How would you handle a reference to a missing class property $product->some_property_we_havent_populated? Throw an error? return null? detect that this hasn't been given the data and grab it on the fly?
    All of what you listed is acceptable.

    Now, I've always taken a different approach, is it right or wrong? Maybe. But I like it for a few reasons. First, my data access does not exist in my objects. I have separate classes for data access. Why? Because I like to try and make only 1 query to the database for a given action. Maybe this is my .NET background, maybe I'm just crazy (well at least a lot of us were crazy, as this is how we did it at my old job).

    The first thing I did was identify the ways in which I needed to retrieve data for my objects. Viewing a Product page, viewing Search results, the Shopping Cart page, etc. What was easily obvious about all of these retrieval processes and each one differed from the other (just slightly, but they were definitely different). So I'd write a query for each, store them in their own Data Access class (or if they were similar enough, store them in the same class as two different functions/methods).

    Then I would consider what indexes needed to be associated to the table based on these queries. To this end, I never had to worry about an object getting its data for a second time because the constructor went out and queried the data, or a property would go out and query the data. I knew exactly what data was being sent back for the given page/action/request because it had a dedicated data access for it.

    Again, I'm not going to say one way is wrong and another is right, I and many others took a different approach and it worked and scaled very well for us. We also required that all connections be closed as soon as possible, so we didn't permit passing connections from method to method or object to object, so you can see why the approach we took was favored over other ones. Connections are expensive, the queries once connected usually are not expensive (unless you have a poorly indexed table; or poorly written query).

  3. #3
    SitePoint Guru bronze trophy
    Join Date
    Dec 2003
    Location
    Poland
    Posts
    930
    Mentioned
    7 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by cpradio View Post
    Now, I've always taken a different approach, is it right or wrong? Maybe. But I like it for a few reasons. First, my data access does not exist in my objects. I have separate classes for data access. Why? Because I like to try and make only 1 query to the database for a given action. Maybe this is my .NET background, maybe I'm just crazy (well at least a lot of us were crazy, as this is how we did it at my old job).

    The first thing I did was identify the ways in which I needed to retrieve data for my objects. Viewing a Product page, viewing Search results, the Shopping Cart page, etc. What was easily obvious about all of these retrieval processes and each one differed from the other (just slightly, but they were definitely different). So I'd write a query for each, store them in their own Data Access class (or if they were similar enough, store them in the same class as two different functions/methods).

    Then I would consider what indexes needed to be associated to the table based on these queries. To this end, I never had to worry about an object getting its data for a second time because the constructor went out and queried the data, or a property would go out and query the data. I knew exactly what data was being sent back for the given page/action/request because it had a dedicated data access for it.

    Again, I'm not going to say one way is wrong and another is right, I and many others took a different approach and it worked and scaled very well for us. We also required that all connections be closed as soon as possible, so we didn't permit passing connections from method to method or object to object, so you can see why the approach we took was favored over other ones. Connections are expensive, the queries once connected usually are not expensive (unless you have a poorly indexed table; or poorly written query).
    cpradio, this sounds interesting, could you post a short example of the Data Access class you are talking about and how you would use it? Let's take an example of data similar to what the OP presented: we have a Product class and each product has a Supplier and also has Accessories. Now I understand you say that the Product constructor fetches the data using the Data Access class, right? Let's say we need to fetch Product data from the database in two different scenarios:

    1. Public web page, where we only need Product and its Accessories.
    2. Admin page where we need Product, Accessories and Supplier.

    (also in each case different fields need to be fetched from the db)

    How do you differentiate between these two cases so that you use sql queries optimized for the given task? Do you pass a parameter to the Product constructor? It would be great if you could illustrate that with sample code!

  4. #4
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,122
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Lemon Juice View Post
    Now I understand you say that the Product constructor fetches the data using the Data Access class, right? Let's say we need to fetch Product data from the database in two different scenarios:
    Sorry, that didn't come out exactly the way I wanted it to (or at least my original intention is easily mis-understood). I meant to say, my objects constructors would not try and access the data access, though I guess they could if they knew which scenario you needed...

    Quote Originally Posted by Lemon Juice View Post
    1. Public web page, where we only need Product and its Accessories.
    2. Admin page where we need Product, Accessories and Supplier.

    (also in each case different fields need to be fetched from the db)

    How do you differentiate between these two cases so that you use sql queries optimized for the given task? Do you pass a parameter to the Product constructor? It would be great if you could illustrate that with sample code!
    I can build something later today (hopefully), but here is the gist

    Typically I would have three layers, My Objects, Business Layer, and Data Access Layer.

    So given the above, the Data Access Layer would have either two classes or 1 class with two methods to retrieve the two scenarios described.
    Scenario 1, would select the product and its accessories as a data set (result whatever) back to the Business Layer which would perform any validation and convert that data set to the appropriate objects and return them to the callee.

    Scenario 2, would do the same calling a different method/class for both Business Layer and Data Access Layer resulting in the Product, Accessories and all Supplier information.

    Granted given how closely similar these two are (assuming Supplier is a limited amount of data, name, website, description), I would tend to have Scenario 1 use the same methods/classes as Scenario 2 and drop Scenario 1 altogether, yes it would return more data than it needed, but it is more maintainable that way.

    Now when the data is significantly different, Cart Products versus Production and Accessories, then the separation is necessary, as the Cart may only need the name, url, price, and stock of the product, and wouldn't need the description, accessories, etc that may make a larger result set returned (or cause paging on the table).

  5. #5
    SitePoint Guru bronze trophy
    Join Date
    Dec 2003
    Location
    Poland
    Posts
    930
    Mentioned
    7 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by cpradio View Post
    I can build something later today (hopefully), but here is the gist
    Ok, there's no hurry and thanks for the explanation . What I'm most interested in is where and how exactly you call the Data Access methods depending on various scenarios. For example, if you have a getSupplier() method in your Product object (or maybe you don't have and it's something different?) then how does the getter know which scenario (eventual sql code) needs to be executed. You also mentioned you don't have to worry about object getting its data for the second time - how is that achieved? I do it with a cache but your approach seems to not need one.

    Thanks.

  6. #6
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,122
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Lemon Juice View Post
    Ok, there's no hurry and thanks for the explanation . What I'm most interested in is where and how exactly you call the Data Access methods depending on various scenarios. For example, if you have a getSupplier() method in your Product object (or maybe you don't have and it's something different?) then how does the getter know which scenario (eventual sql code) needs to be executed. You also mentioned you don't have to worry about object getting its data for the second time - how is that achieved? I do it with a cache but your approach seems to not need one.

    Thanks.
    Yeah, I'll work on that. In the mean time another helping answer, my objects don't have methods that run any Data Access which is why I don't have to worry about it. Hopefully in a few hours I can post some code that resembles this technique (granted I primarily have only used it in .NET, but I'm sure I can work up a PHP example).

  7. #7
    SitePoint Wizard silver trophybronze trophy Cups's Avatar
    Join Date
    Oct 2006
    Location
    France, deep rural.
    Posts
    6,869
    Mentioned
    17 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by hessodreamy
    $product->some_property_we_havent_populated
    If all the products are stored in the same table, and you add a new field, say "colour", then someone tries to access $product->colour then if it is "not populated" but does exist, you will get a null value, or whatever the default value is in your table.colour

    OR do you mean :

    If all the products are stored in the same table, and someone tries to access $product->shape which does not exist you have to decide how to handle that, silently ignore it and carry on (returning those fields which do exist)? Throw an err/exception etc.

    If your products can potentially be stored in multiple tables perhaps depending on product types, then the challenge is going to be slightly different.

    If your "end-user" is another developer then the exception/error might be a suitable response, if the end-user is a non-developer via a website who may be filtering on data depending on a GET string (which can be fiddled with, or be an old url link) then you might want to silently ignore and carry on displaying what you can.

    Quote Originally Posted by hessodreamy
    There's also the situations where a list of (eg) products will be grabbed from the database with a single query, and Products being passed their instance data directly
    This seems to suggest that load() be a bit smarter and respond to $ids being an array:
    eg.
    PHP Code:
    $arr = array(23,99,876);
    $ilelds = array('colour''shape');

    $kettles = new Product($arr$fields);
    $kettles->load(); 
    Or, create a loadMulti() method.

    Then you are then possibly heading into the realms of creating an SQL query string builder.

    Once you crack that, you can then refactor that code out into its own class which you can possibly use for both Supplier and Product and others.

    Then, when you switch to using PDO, or sqlite or any other rdbms you only alter one class.

    Making a generic SQL query string builder class which interacts with one single table is quite easy to do it probably doing the work of a Data Mapper.

    The problem arises when you want to do JOINS -- which you will be doing, one trick is to leave the door open so that you can create complex sql statements and just passThru($sql) to get the data you want.

    If you know you are going to be doing a lot of JOINS then either forget about a query builder, or face the fact that your Product class will contain lots of methods like this:

    PHP Code:
    function loadProductWithSupplier($ids$fields null){} 
    The best solution for your problem will, as ever, turn on your situation such as depending whether you are engineering a CRUD operation for managing data, or, whether you are simply creating simple and fast read-only data views.

  8. #8
    SitePoint Guru bronze trophy TomB's Avatar
    Join Date
    Oct 2005
    Location
    Milton Keynes, UK
    Posts
    989
    Mentioned
    9 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by Cups View Post

    The problem arises when you want to do JOINS -- which you will be doing, one trick is to leave the door open so that you can create complex sql statements and just passThru($sql) to get the data you want.

    If you know you are going to be doing a lot of JOINS then either forget about a query builder, or face the fact that your Product class will contain lots of methods like this:

    PHP Code:
    function loadProductWithSupplier($ids$fields null){} 
    The best solution for your problem will, as ever, turn on your situation such as depending whether you are engineering a CRUD operation for managing data, or, whether you are simply creating simple and fast read-only data views.

    This is the biggest problem. The original question of Should a product fetch its own data? Probably not is the answer but that's a case of ActiveRecord vs DataMapper which has been discussed many times before. Separation of concerns implies DataMapper is the better pattern (and I agree).

    The problem as you say, comes with joins but your solution presents even more problems: The SoC benefits from DataMapper are lost as soon as you start giving products/supplies access to the database as it tightly couples them to it. Once a product is coupled to the database, any flexibility it once had is lost.

    The answer is to handle joins outside the data objects. On the face of it, the solution is:

    PHP Code:
    $product $productMapper->findById(123);
    $supplier $supplierMapper->findById($product->supplierId); 
    However, this creates very brittle code due to lack of encapsulation and the need of repetition. Anywhere this code is reused it needs to be repeated. What if "supplierId" becomes supplier_id. Messy! This code is clearly an OO layer over the top of relational data.

    In an ideal world you should be able to call $product->getSupplier() from anywhere and it work automatically, but at the same time keeping it so that $product does not have a direct dependency on the database.

    I won't go over old ground as there was a very good and in-depth discussion on how this can be achieved here: http://www.sitepoint.com/forums/show...Mapper-Library

  9. #9
    SitePoint Guru bronze trophy
    Join Date
    Dec 2003
    Location
    Poland
    Posts
    930
    Mentioned
    7 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by TomB View Post
    In an ideal world you should be able to call $product->getSupplier() from anywhere and it work automatically, but at the same time keeping it so that $product does not have a direct dependency on the database.

    I won't go over old ground as there was a very good and in-depth discussion on how this can be achieved here: http://www.sitepoint.com/forums/show...Mapper-Library
    In this other thread you present this example:
    PHP Code:
    echo $user->orders[0]->items[0]->product->manufacturer->name
    It's very convenient OOP but how many sql statements are executed under the hood? Also, how many orders are retrieved from the db and how many items of the first order are retrieved? Here we just need the first order and the first item of the first order.

  10. #10
    SitePoint Guru bronze trophy TomB's Avatar
    Join Date
    Oct 2005
    Location
    Milton Keynes, UK
    Posts
    989
    Mentioned
    9 Post(s)
    Tagged
    2 Thread(s)
    Because it's using offsetGet the implementation can do anything. In this case (assuming that it's using a database) it would be able to run a query that used LIMIT 1 OFFSET $offset.

    One SQL statement is used per mapper. However, this is often faster than joins.

    The main point here, though, is that they don't have to be rows in the DB, they could be coming from a web service. The users could be in a database and the products could be in an XML file or totally different database/server for instance. This offers far greater flexibility than tying all your data to a single source.

  11. #11
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,122
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    Okay, here is the general idea (there is a lot of work I didn't do, such as establishing a real connection or a lot of validation/refactoring for efficiency type stuff. Don't focus on the fact that the methods/functions are static (that was just to reduce my coding a slight bit).

    In essence, your Objects are more like DTOs (Data Transfer Objects). They may contain some useful utility functions for the item, but for the most part they are just a way to transfer that data from a controller to a view, or whatever path you may need to take.

    Objects:
    PHP Code:
    <?php
    namespace Objects;

    class 
    Product
    {
        private 
    $productId;
        public function 
    GetProductId()
        {
            return 
    $this->productId;
        }
        public function 
    SetProductId($value)
        {
            if (
    is_int($value))
            {
                
    $this->productId $value;
            }
        }

        private 
    $name;
        public function 
    GetName()
        {
            return 
    $this->name;
        }
        public function 
    SetName($value)
        {
            
    $this->name $value;
        }

        private 
    $supplier;
        public function 
    GetSupplier()
        {
            return 
    $this->supplier;
        }
        public function 
    SetSupplier($value)
        {
            
    $this->supplier $value;
        }

        private 
    $accessories;
        public function 
    GetAccessories()
        {
            return 
    $this->accessories;
        }
        public function 
    SetAccessories($value)
        {
            
    $this->accessories $value;
        }
        public function 
    AddAccessory($accessoryId$accessoryName)
        {
            if (!
    is_array($this->accessories))
            {
                
    $this->accessories = array();
            }

            
    $this->accessories[] = new Accessory($accessoryId$accessoryName);
        }
    }

    class 
    Accessory
    {
        public 
    $accessory_id;
        public 
    $name;

        public function 
    __construct($accessory_id$name)
        {
            
    $this->accessory_id $accessory_id;
            
    $this->name $name;
        }
    }

    class 
    Supplier
    {
        public 
    $supplier_id;
        public 
    $name;

        public function 
    __construct($supplier_id$name)
        {
            
    $this->supplier_id $supplier_id;
            
    $this->name $name;
        }
    }
    ?>
    Business Layer: (you will notice I am constantly overwriting the product id, name, supplier id and name, this is because I was lazy in my example)
    PHP Code:
    <?php
    namespace Business;

    class 
    Product
    {
        public static function 
    GetProductInfoAndAccessories($productId)
        {
            if (
    is_int($productId))
            {
                
    $results = \DataAccess\Product::GetProductInfoAndAccessories($productId);
                if (
    $results != null)
                {
                    
    // process results into object
                    
    $product = new \Objects\Product();
                    foreach (
    $results as $row)
                    {
                        
    $product->SetProductId($row['product_id']);
                        
    $product->SetName($row['product_name']);
                        
    $product->AddAccessory($row['accessory_id'], $row['accessory_name']);
                    }

                    return 
    $product;
                }

                return 
    null// or throw an exception
            
    }
            else
            {
                throw new 
    Exception("Must provide a valid product id");
            }
        }

        public static function 
    GetProductInfoAccessoriesAndSupplier($productId)
        {
            if (
    is_int($productId))
            {
                
    $results = \DataAccess\Product::GetProductInfoAccessoriesAndSupplier($productId);
                if (
    $results != null)
                {
                    
    // process results into object
                    
    $product = new \Objects\Product();
                    foreach (
    $results as $row)
                    {
                        
    $product->SetProductId($row['product_id']);
                        
    $product->SetName($row['product_name']);
                        
    $product->AddAccessory($row['accessory_id'], $row['accessory_name']);
                        
    $product->SetSupplier(new \Objects\Supplier($row['supplier_id'], $row['supplier_name']));
                    }

                    return 
    $product;
                }

                return 
    null// or throw an exception
            
    }
            else
            {
                throw new 
    Exception("Must provide a valid product id");
            }
        }
    }
    ?>
    Data Access: (the Base would typically have a Close function as well to destroy/unset the connection variable)
    PHP Code:
    <?php
    namespace DataAccess;

    class 
    Product extends Base
    {
        public static function 
    GetProductInfoAndAccessories($productId)
        {
            
    self::GetConnection()->prepare("SELECT p.product_id, p.name AS product_name, a.accessory_id, a.name AS accessory_name 
                      FROM Products AS p LEFT JOIN Accessories AS A ON p.product_id = a.product_id WHERE p.product_id = ?"
    );
            return 
    self::GetConnection()->execute(array($productId));
        }

        public static function 
    GetProductInfoAccessoriesAndSupplier($productId)
        {
            
    self::GetConnection()->prepare("SELECT p.product_id, p.name AS product_name, a.accessory_id, a.name AS accessory_name, s.supplier_id, s.name AS supplier_name 
                      FROM Products AS p LEFT JOIN Accessories AS A ON p.product_id = a.product_id LEFT JOIN Suppliers AS S on p.product_id = s.product_id WHERE p.product_id = ?"
    );
            return 
    self::GetConnection()->execute(array($productId));
        }
    }

    class 
    Base
    {
        private static 
    $connection;
        public static function 
    GetConnection()
        {
            if (
    self::$connection === null)
            {
                
    self::$connection = ... // define a new connection
            
    }

            return 
    self::$connection;
        }
    }
    ?>
    Page needing Product and Accessories (don't forget, Business\Product validates/filters the $_GET item
    PHP Code:
    $productInfo = \Business\Product::GetProductInfoAndAccessories($_GET['product_id']);
    // Do whatever you need to do with $productInfo; pass it to your view, whatever, it is your DTO 
    Page needing Product, Accessories, and Suppliers
    PHP Code:
    $productInfo = \Business\Product::GetProductInfoAccessoriesAndSuppliers($_GET['product_id']);
    // Do whatever you need to do with $productInfo; pass it to your view, whatever, it is your DTO 
    Again, not saying this is a right or wrong way, just a different approach that can/did work well for the applications we maintained at my old job.

  12. #12
    SitePoint Guru bronze trophy
    Join Date
    Dec 2003
    Location
    Poland
    Posts
    930
    Mentioned
    7 Post(s)
    Tagged
    0 Thread(s)
    cpradio, thanks for your explanation, that was a very detailed and clear description. I can see similarities in your method to what I often do - my object structure is based on Propel ORM (some older version, now they may have refactored their code) where each object (\Object) has a corresponding 'peer' object (\Business) with static methods similar to yours. So the Product object is like your \Objects\Product and the ProductPeer object is like \Business\Product and there I can put methods like GetProductInfoAccessoriesAndSuppliers(). The major difference is I don't use a separate Data Access class, I perform all of that in the peer (\Business) methods. This is convenient when writing the code because all fetching code is in one place but there is less clean separation of tasks - I don't feel I'm missing this separation, however I haven't tried it yet so I can't say if that's wouldn't be beneficial. Certainly having a separate Data Access class would allow me to use the same methods for fetching data in different scenarios, for example when I don't need to feed the data into objects and plain array would suffice.

    The thing is that I can actually do the same thing as you in that I do not have to worry about fetching data for the second time except I didn't always take advantage of this possibility probably because I wanted to do it too properly, in other words fetching as little as possible and then fetching more chunks of data on demand. For example, I first fetch the Product object that would contain data from the product table only and I don't think at this stage what other related data I might need later. So whenever a method like GetSupplier() is called that method has to worry about fetching the data from the database. My reasoning was that if an object has GetSupplier() method then it needs to always work, no matter how the object was instantiated. In your case GetSupplier() will not work if you don't call the appropriate method in your Business class so you need to think beforehand what data you will need and call the right business method. In an ideal world I wouldn't need to think about it beforehand but I can see in practice it's a good idea to do so at least from performance point of view - fitting relational database access into objects is not possible if you want both the [full] convenience of the objects and optimised database access.

    Thanks, that was inspiring and I may now lean more towards the way you fetch objects since it makes sense and I can already do it in my current setup, it's just a question of being aware of it and putting it to use.

    BTW, are your objects writable? Can you fetch a Product object, change something in it and then call $product->save() to save the changes in the database? I'm asking because when you have specialised methods for fetching in your Business Layer and some of them don't fetch all of the data then save() might not work properly because some fields in the object are empty and may be unintentionally deleted on saving. Sure, there are ways to deal with it but it makes the whole system more complicated.

  13. #13
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,122
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Lemon Juice View Post
    The thing is that I can actually do the same thing as you in that I do not have to worry about fetching data for the second time except I didn't always take advantage of this possibility probably because I wanted to do it too properly, in other words fetching as little as possible and then fetching more chunks of data on demand.
    And that is a valid point, I just dislike how chatty that scenario is, constantly trying to get data that could have been established once earlier. Think about it, you are not trying to get the data across pages, you are still trying to load the same freakin' page but just realized you need data you didn't have, so you go back out to the database to get it. In my opinion, this style lacks requirements/clarity as to what the application and each page in the application is supposed to do. It isn't "wrong", it just doesn't seem efficient and could lead to a lot of chattiness. Just my very strong opinion

    Quote Originally Posted by Lemon Juice View Post
    Thanks, that was inspiring and I may now lean more towards the way you fetch objects since it makes sense and I can already do it in my current setup, it's just a question of being aware of it and putting it to use.

    BTW, are your objects writable? Can you fetch a Product object, change something in it and then call $product->save() to save the changes in the database?
    Yes. You would could update the product properties and then pass it to a \Business\Products::Save($product) method (actually you can have a $product->save() that then calls \Business\Products::Save($product) too), the idea is your data goes to the Business layer first, the business layer validates all properties, etc.

    You can also take it a step further and develop a "isDirty" indicator for each property (or a framework really that the \Objects\Product class could extend) that would track what properties were changed. Then you know what properties need to be updated/inserted/added/deleted, etc. (I haven't developed this framework yet; though now it seems like a really neat challenge, so maybe in my spare time I might try it).

  14. #14
    Non-Member
    Join Date
    Oct 2007
    Posts
    363
    Mentioned
    11 Post(s)
    Tagged
    0 Thread(s)
    Aren't you trying to solve a problem that has already been solved by the Doctrine 2 ORM? I would highly recommend reading the following articles:

    http://ihaveabackup.net/2012/02/27/w...trine2-anyway/
    http://symfony.com/doc/current/book/doctrine.html

  15. #15
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,122
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by aaarrrggh View Post
    Aren't you trying to solve a problem that has already been solved by the Doctrine 2 ORM? I would highly recommend reading the following articles:

    http://ihaveabackup.net/2012/02/27/w...trine2-anyway/
    http://symfony.com/doc/current/book/doctrine.html
    Hey @aaarrrggh ;, have you had any personal experience with Doctrine 2 ORM? I'd like to hear about your personal experience with it (if any). This thread has a lot of potential to be a good resource for many different users, a personal story can easily help sell an ORM.

    My personal experience with ORM's and Frameworks is limited. I have never had the need to use them in PHP, and in the .NET world our company had a strict rule of not using third-party products unless your team had time to vet them (so we would end up creating our own without any bells and whistles, like I did in my prior example -- first time I wrote that in PHP too).

    When it comes down to it, I still like to roll my own versus using a third party solution for two reasons, 1) I get to learn a lot by having to re-invent the wheel, and 2) Most of the third party solutions I've researched using have a lot more to them than what I would need, so why bundle it with an already growing codebase? I still believe ORMs and third party Frameworks are good, for some, but not necessarily for everyone.

    @hessodreamy ; Lots of food for thought here, if there is any one aspect that sounds interesting to you, please ask more questions on how it may be suitable to your project (or not suitable). Out of all of the replies here, there is a LOT of good information, different tactics, etc.

  16. #16
    Non-Member
    Join Date
    Oct 2007
    Posts
    363
    Mentioned
    11 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by cpradio View Post
    Hey @aaarrrggh ;, have you had any personal experience with Doctrine 2 ORM? I'd like to hear about your personal experience with it (if any). This thread has a lot of potential to be a good resource for many different users, a personal story can easily help sell an ORM.

    My personal experience with ORM's and Frameworks is limited. I have never had the need to use them in PHP, and in the .NET world our company had a strict rule of not using third-party products unless your team had time to vet them (so we would end up creating our own without any bells and whistles, like I did in my prior example -- first time I wrote that in PHP too).

    When it comes down to it, I still like to roll my own versus using a third party solution for two reasons, 1) I get to learn a lot by having to re-invent the wheel, and 2) Most of the third party solutions I've researched using have a lot more to them than what I would need, so why bundle it with an already growing codebase? I still believe ORMs and third party Frameworks are good, for some, but not necessarily for everyone.

    @hessodreamy ; Lots of food for thought here, if there is any one aspect that sounds interesting to you, please ask more questions on how it may be suitable to your project (or not suitable). Out of all of the replies here, there is a LOT of good information, different tactics, etc.
    Hey CP - yeah, I was using Doctrine 2 just today actually.

    As I said in the above post - it's really great, because you end up simply writing plain old php objects. The only thing that makes them different to standard php objects in anyway is that you attach certain meta tags to fields that helps the Doctrine 2 orm do it's thing.

    I may write something up here later to explain this more fully. Perhaps I could give you some examples from the stuff I've been doing recently. I can tell you I've been using Doctrine 2 inside the Zend Framework 1 based on this tutorial video that I found: http://www.zendcasts.com/unit-testin...ities/2011/02/ (Even though the video is Entitled "Unit Testing Doctrine 2 Entities", it goes beyond that - shows you how to set it up in ZF1, and also shows you how to set it up so you end up using a sqlite db for unit testing while maintaining a mysql database for production/development and staging).

    I'm currently doing some work (as always), but in a few weeks I'll have 2 weeks off due to paternity, so perhaps I could write up an example of something here for people to look at?

    It's really a great tool. It saves you so many headaches when it comes to setting up the legwork for your domain level logic.

  17. #17
    SitePoint Wizard TheRedDevil's Avatar
    Join Date
    Sep 2004
    Location
    Norway
    Posts
    1,196
    Mentioned
    4 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by cpradio View Post
    Again, I'm not going to say one way is wrong and another is right, I and many others took a different approach and it worked and scaled very well for us. We also required that all connections be closed as soon as possible, so we didn't permit passing connections from method to method or object to object, so you can see why the approach we took was favored over other ones. Connections are expensive, the queries once connected usually are not expensive (unless you have a poorly indexed table; or poorly written query).
    That is a strange approach when you use SQL Server, as you say each connection is very expensive. But perhaps it worked well for the type of application you guys worked on.

    On a side note, connections in MySQL is very cheap (compared to SQL Server) so this is not a worry here.

    Quote Originally Posted by aaarrrggh View Post
    It also has some exceptionally good stuff for database migrations, and is the first ORM I've ever used that works so well with any data source that I've actually used this facility in pracitice. In my current application, all of the unit testing is done in a sqlite database, while the production and staging server and my development environment all use mysql - I have had to change literally nothing in my objects at all to make this work (it's literally a one line thing in a config file that basically says "if in the testing environment, load through sqlite instead).
    The problem with an ORM is that you sacrifice database/server speed/efficiency for the ability to write faster code. Since PHP also compile this data on each run, it also make an ORM less effective in this language compared to Java and C#.

    Personally we prefer a plain DOA approach, where we have a class for each table in the database, and then separate methods depending on what data that is required and if it pull or insert, as this allow us to optimize the queries to the fullest. It also makes it very easy to pull a cache solution down on the queries as well, or change to a different database type.

  18. #18
    Non-Member
    Join Date
    Oct 2007
    Posts
    363
    Mentioned
    11 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by TheRedDevil View Post
    That is a strange approach when you use SQL Server, as you say each connection is very expensive. But perhaps it worked well for the type of application you guys worked on.

    On a side note, connections in MySQL is very cheap (compared to SQL Server) so this is not a worry here.


    The problem with an ORM is that you sacrifice database/server speed/efficiency for the ability to write faster code. Since PHP also compile this data on each run, it also make an ORM less effective in this language compared to Java and C#.

    Personally we prefer a plain DOA approach, where we have a class for each table in the database, and then separate methods depending on what data that is required and if it pull or insert, as this allow us to optimize the queries to the fullest. It also makes it very easy to pull a cache solution down on the queries as well, or change to a different database type.
    I think the speed argument with regards to the ORM issue is overstated in many cases. Also, Doctrine 2 has methods for dealing with caching built in, so it can actually be far more efficient than writing standard mysql code.

    I haven't really experienced too many issues with regard to speed in practice. In general, when people try to write their own implementations for this sort of thing they tend to end up hitting brick walls and writing all kinds of messy hacks to deal with issues further down the line.

    I think I may write up a mini tutorial at some point if I can find the time...

  19. #19
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,122
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by TheRedDevil View Post
    That is a strange approach when you use SQL Server, as you say each connection is very expensive. But perhaps it worked well for the type of application you guys worked on.
    Can you elaborate on why it is strange to pull all of the data you need upfront and not have to go back time and time again for more data? So you ultimately, make 1 connection, do your query (or queries), disconnect and never connect again until the next page request/submission. I just curious as to why you consider it strange (call it my curiosity).

  20. #20
    SitePoint Wizard bronze trophy
    Join Date
    Jul 2006
    Location
    Augusta, Georgia, United States
    Posts
    4,147
    Mentioned
    16 Post(s)
    Tagged
    3 Thread(s)
    The thing you really need to ask yourself is whether you are looking for a generic framework or just a series of patterns to follow. If you are looking to build a generic framework for handling this layer than you are probably better off using something like Doctrine. However, if you would like the code to be as straight-forward and efficient but as possible than a solution like cpradio presented is a good option. Really that solution is nothing more than a data mapper/orm with an extra layer for mapping the the result set to entities. Nothing is automated though like it would be in doctrine, which can be a huge advantage or disadvantage depending on different circumstances.

    Quote Originally Posted by aaarrrggh
    Aren't you trying to solve a problem that has already been solved by the Doctrine 2 ORM? I would highly recommend reading the following articles:
    The problem with doctrine is it is very complex and a beast to understand and maintain. Powerful of course but it is huge. Not creating a generic automated mapper will always be more code efficient and readable though it requires writing everything from scratch or just about. I have nothing against doctrine myself just find it large and unnecessary at times.
    The only code I hate more than my own is everyone else's.

  21. #21
    Non-Member
    Join Date
    Oct 2007
    Posts
    363
    Mentioned
    11 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by oddz View Post
    The thing you really need to ask yourself is whether you are looking for a generic framework or just a series of patterns to follow. If you are looking to build a generic framework for handling this layer than you are probably better off using something like Doctrine. However, if you would like the code to be as straight-forward and efficient but as possible than a solution like cpradio presented is a good option. Really that solution is nothing more than a data mapper/orm with an extra layer for mapping the the result set to entities. Nothing is automated though like it would be in doctrine, which can be a huge advantage or disadvantage depending on different circumstances.



    The problem with doctrine is it is very complex and a beast to understand and maintain. Powerful of course but it is huge. Not creating a generic automated mapper will always be more code efficient and readable though it requires writing everything from scratch or just about. I have nothing against doctrine myself just find it large and unnecessary at times.

    Have you checked out Doctrine 2? It's very different to Doctrine 1. You actually use plain old php objects to map out your domain logic and these classes end up looking very similar to the classes we see in this thread. About the only significant difference is that you don't have any sql in them (generally, although you can do this if you want to for some reason - although this kinda defeats the purpose of a DBAL). You use meta information in the class to map out the persistance layer, and from there it is dealt with for you. You can end up doing stuff like this:

    PHP Code:
    $comments $blogPost->getComments();

    foreach (
    $comments as $comment){
     
    $comment->getText();
     
    $comment->getUsername();

    Based on the relations you've set up in your own objects.

    On top of this you don't have complex issues to worry about, such as worrying about what will happen if say a comment is updated while you're working on a blog post with related comments (all tracked automatically for you). It also does lazy loading, so it won't load the comments in the above instance automatically unless you explicitly need to call the getComments() method.

    It also has some exceptionally good stuff for database migrations, and is the first ORM I've ever used that works so well with any data source that I've actually used this facility in pracitice. In my current application, all of the unit testing is done in a sqlite database, while the production and staging server and my development environment all use mysql - I have had to change literally nothing in my objects at all to make this work (it's literally a one line thing in a config file that basically says "if in the testing environment, load through sqlite instead).

    Overall I've found it excellent, not particularly hard to get your head around and it produces massive benefits if you want to map your business domain logic to a persistance layer of any kind.

    Here's another article about the principle behind domain driven design: http://net.tutsplus.com/tutorials/domain-driven-design/

    I would highly recommend learning this, because it will prevent you from re-inventing the wheel time and time again. Don't mistake Doctrine 2 for Doctrine 1 or for Active Record based ORMS. This is on a whole other level, and I believe it solves the original poster's problems.

  22. #22
    SitePoint Guru bronze trophy TomB's Avatar
    Join Date
    Oct 2005
    Location
    Milton Keynes, UK
    Posts
    989
    Mentioned
    9 Post(s)
    Tagged
    2 Thread(s)
    Considering there are many more cases apart from pulling the first item from the data source you need to have a very intelligent translator of object access properties and methods to sql so that it remains efficient. Still, even the most intelligent translator cannot optimize everything if you don't tell it up front what data you will need to request later because often it's much faster to get more data in bigger chunks than to request many smaller ones. Therefore from performance point of view what cpradio suggests makes more sense - fetch all data beforehand in one go.
    No ORM can optimise everything. You don't want to tell it what data to fetch. Part of the job of the ORM is to ensure consistency. From an OOP perspective objects should always be complete. Admittedly, in the name of performance it's usually better to specify certain columns (such as blob ones) which are lazy loaded, but as a rule, a record is a data set and needs to have integrity. By fetching part of the data, you lose this.

    Whether you're using joins or separate queries, the DB is going to be the slowest part of the system.

    Have you benchmarked that? Most of the time additional queries are faster than joins, especially when sorting is involved or you take prepared statements into account. The number of queries has little effect, if you're running 10 queries that do a PK lookup and all run in a matter of miliseconds it's better than running a query which does 10 joins, a sort and takes 2 seconds to run.

    If you are indeed "fetching all the data beforehand" that is potentially far worse performance! You're fetching data you may never need or want.

    However, I don't think it doesn't sacrifice some portion of performance which can be important for any large system. For anything small it's sweet to have so much data source abstraction and independence but when a site gets large and datasets expand then it becomes important to tweak or rewrite individual sql queries and at that stage this abstraction becomes a hindrance. And I don't think being able to substitute a relational database to XML files or whatever else you may think of is important (unless you have a specific requirement for this in a project). In a small system you can play with it but with a large database this would be almost insane
    Again, you're making assumptions about performance problems that don't exist because additional queries are often faster!

    Funnily enough, it's those larger systems where being able to join from multiple sources has the largest benefit and practical use because they often do need to connect to external data sources, multiple databases, etc. In fact, it's only in small self-contained systems where you wanted to use a DB specific data mapper because you can be fairly sure no other data sources will be needed. The larger the system, the more features there are and it becomes increasingly more likley that external/unknown data sources are needed.

    Consider:

    PHP Code:
    //Show user's latest tweet
    echo $this->mapper->user->findById(123)->twitter[0]->content
    Which would use a mapper that connected to the users twitter feed and found the tweets. That is, of course, the tip of the iceberg.


    This is an ongoing question of how far we are willing to go with implementing good and flexible OOP at the expense of practical usefulness (performance). I don't think in PHP I would go as far as you but certainly that would be a good exercise in learning OOP techniques. Everyone chooses their own balance.
    Again though, this is a strawman argument. There is no performance issue. Using inner joins can be slower!

    - no need to define metadata, no XML or other model/relationship configuration files, no other maintenance chores (in my case it's just a question of running one script that will automatically reverse-engineer all my database structure into proper objects)
    You do need some metadata somewhere. The fact that you have expressed it inside queries doesn't mean that it isn't there. Those joins in your queries ARE the metadata. They are defining the relationships between your entities. Whether they're stored in an SQL query, an XML file or wherever is irrelevant, any ORM which handles relationships will need some knowledge of it.


    You are too hung up on a performance issue (which doesn't even exist!) ignoring that, there is always a cost-to-benefit ratio. The cost to performance is nil or almost nil whereas the benefit is huge. Consider testing. Being able to quickly and easily substitute any mapper to use an XML file instead of the database immediately makes testing far easier by removing the database dependency.

    A practical example: On one site I worked on, a client had a stock control system used at their physical store which already contained almost all the information we'd ever need about the products. This was a desktop application running on their network that could be connected to as a web service to retrieve the data. By simply altering a couple of lines in the mapper definition all the data could be used in real-time and work with our existing shopping cart system with no changes and no need to store a copy of the data in the database potentially creating syncing issues! Then simply save the data back into the stock control system transparently. All using the existing code that we have used on several entirely database driven sites.

  23. #23
    SitePoint Guru bronze trophy
    Join Date
    Dec 2003
    Location
    Poland
    Posts
    930
    Mentioned
    7 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by TomB View Post
    No ORM can optimise everything. You don't want to tell it what data to fetch. Part of the job of the ORM is to ensure consistency. From an OOP perspective objects should always be complete. Admittedly, in the name of performance it's usually better to specify certain columns (such as blob ones) which are lazy loaded, but as a rule, a record is a data set and needs to have integrity. By fetching part of the data, you lose this.
    Yes, I agree, objects should be always complete. That's why I said that loading data in advance is a compromise if we want to achieve better performance - if performance can be achieved that way, for as far as I can see it is debatable right now .

    Quote Originally Posted by TomB View Post
    Have you benchmarked that? Most of the time additional queries are faster than joins, especially when sorting is involved or you take prepared statements into account. The number of queries has little effect, if you're running 10 queries that do a PK lookup and all run in a matter of milliseconds it's better than running a query which does 10 joins, a sort and takes 2 seconds to run.

    If you are indeed "fetching all the data beforehand" that is potentially far worse performance! You're fetching data you may never need or want.

    Again, you're making assumptions about performance problems that don't exist because additional queries are often faster!
    I can't agree with that statement entirely. It all depends on what kind of query is run and we can't generalise it like that. In my experience it happens *sometimes* that separate queries are faster than joins but I've had very few cases like that. I can see that most of my joins perform very fast. Just for the sake of curiosity I've just made a benchmark of a common join, a simple case where I make a listing of products and need to fetch manufacturer name from a separate table:
    Code:
    SELECT p.*, m.*	FROM products p
    	LEFT JOIN manufacturer m
    	ON m.id=p.id
    	
    	WHERE p.hidden=0
    	ORDER BY p.price
    	LIMIT 1000;
    Table about 6000 records, no indexes except on ids. I did this without any ORMs, I just loaded the entire result set into array and timed it. Then I did the same without a join and in a loop I loaded each manufacturer data from the database. The whole operation was about 2.5 times faster for the join. The interesting thing was that when I did the same benchmark loading the data into objects through my ORM then the join performed about 2.5 times slower - but that was due to the way I fetch data - when I fetch data into a single object (without joins) then I load all row with a single statement, however for joins there is a loop that traverses all fetched fields so that the appropriate ones are populated into related objects. So taking this into consideration join will be slower in my case because I can't optimize fetching the data (I can't find a way to load the related row without looping over each column and doing some logic to determine where the data should go).

    So it's not so much a question of joins vs separate queries but rather the overall overhead of the mapper/ORM. Loading everything properly into objects can be 2 or 3 times slower compared to a more direct method - this may not be much for most sites but for busy ones it may mean sooner need for a dedicated server. If I allow myself some inconsistency in my objects then I can for example load the manufacturer name with a JOIN into the Product object and the speed will be better than any other solution. Sure, inconsistency will have its price.


    Quote Originally Posted by TomB View Post
    Funnily enough, it's those larger systems where being able to join from multiple sources has the largest benefit and practical use because they often do need to connect to external data sources, multiple databases, etc. In fact, it's only in small self-contained systems where you wanted to use a DB specific data mapper because you can be fairly sure no other data sources will be needed. The larger the system, the more features there are and it becomes increasingly more likley that external/unknown data sources are needed.

    Consider:

    PHP Code:
    //Show user's latest tweet
    echo $this->mapper->user->findById(123)->twitter[0]->content
    Which would use a mapper that connected to the users twitter feed and found the tweets. That is, of course, the tip of the iceberg.
    You may have a point here with the larger systems but I don't think the twitter example is the best one because even in a DB specific system I can easily do the above if I write my own method for connecting to twitter. But then yes, I need to do it all properly and not try to fetch twitter content with sql, I need to stick to the objects. The problem might be if I first had twitter data in DB and used SQL, then I switched to some remote feed.

    Quote Originally Posted by TomB View Post
    You do need some metadata somewhere. The fact that you have expressed it inside queries doesn't mean that it isn't there. Those joins in your queries ARE the metadata. They are defining the relationships between your entities. Whether they're stored in an SQL query, an XML file or wherever is irrelevant, any ORM which handles relationships will need some knowledge of it.
    Yes, I fully agree that I need metadata somewhere. I meant that I don't have to define them anywhere because they are generated automatically. The point is I don't need to spend time on them.

    Quote Originally Posted by TomB View Post
    You are too hung up on a performance issue (which doesn't even exist!) ignoring that, there is always a cost-to-benefit ratio. The cost to performance is nil or almost nil whereas the benefit is huge. Consider testing. Being able to quickly and easily substitute any mapper to use an XML file instead of the database immediately makes testing far easier by removing the database dependency.

    A practical example: On one site I worked on, a client had a stock control system used at their physical store which already contained almost all the information we'd ever need about the products. This was a desktop application running on their network that could be connected to as a web service to retrieve the data. By simply altering a couple of lines in the mapper definition all the data could be used in real-time and work with our existing shopping cart system with no changes and no need to store a copy of the data in the database potentially creating syncing issues! Then simply save the data back into the stock control system transparently. All using the existing code that we have used on several entirely database driven sites.
    Yep, I can see benefit in that, point taken!

    BTW, do you implement some form of caching? When you do a chain like $this->mapper->user->findById(123)->twitter[0]->content a few times then does your mapper fetch the data again from the DB?

  24. #24
    SitePoint Guru bronze trophy TomB's Avatar
    Join Date
    Oct 2005
    Location
    Milton Keynes, UK
    Posts
    989
    Mentioned
    9 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by Lemon Juice View Post
    I can't agree with that statement entirely. It all depends on what kind of query is run and we can't generalise it like that. In my experience it happens *sometimes* that separate queries are faster than joins but I've had very few cases like that. I can see that most of my joins perform very fast. Just for the sake of curiosity I've just made a benchmark of a common join, a simple case where I make a listing of products and need to fetch manufacturer name from a separate table:
    Code:
    SELECT p.*, m.*	FROM products p
    	LEFT JOIN manufacturer m
    	ON m.id=p.id
    	
    	WHERE p.hidden=0
    	ORDER BY p.price
    	LIMIT 1000;
    Table about 6000 records, no indexes except on ids. I did this without any ORMs, I just loaded the entire result set into array and timed it. Then I did the same without a join and in a loop I loaded each manufacturer data from the database. The whole operation was about 2.5 times faster for the join. The interesting thing was that when I did the same benchmark loading the data into objects through my ORM then the join performed about 2.5 times slower - but that was due to the way I fetch data - when I fetch data into a single object (without joins) then I load all row with a single statement, however for joins there is a loop that traverses all fetched fields so that the appropriate ones are populated into related objects. So taking this into consideration join will be slower in my case because I can't optimize fetching the data (I can't find a way to load the related row without looping over each column and doing some logic to determine where the data should go).
    How many queries did you run? You should have run 1 query to fetch the products and one for the manufacturer of each product. The result being N+1 queries. Imagine you want to sort the initial product list. Add ORDER BY product.name to that or ORDER BY RAND() and your single query will be immensely slower because it's sorting the entire data set. Move the sort to array_shuffle($products) and it's suddenly much faster. This is scaled by the number of tables you are joining.

    So it's not so much a question of joins vs separate queries but rather the overall overhead of the mapper/ORM. Loading everything properly into objects can be 2 or 3 times slower compared to a more direct method - this may not be much for most sites but for busy ones it may mean sooner need for a dedicated server. If I allow myself some inconsistency in my objects then I can for example load the manufacturer name with a JOIN into the Product object and the speed will be better than any other solution. Sure, inconsistency will have its price.
    But it depends what your query is doing. Once you start adding more joins, sorts and grouping (which you'll need if you using joins in this way) the single query method very quickly loses any performance benefit it had. Simple is always faster and once you start using prepared statements to fetch the manufacturers the difference will become even less in your very simplistic example.

    That's fair enough, but we recently had an interesting topic on premature optimisation and that's exactly what it sounds like to me. The problem is you're sacrificing flexibility in the name of performance. Fixing performance problems can easily be done after the fact with a host of different methods. Fixing flexibility problems generally involves re-writing a lot of code because the code isn't flexible enough to handle anything outside what it was initially designed to do. Once the client moves the goalposts and says

    Yes, I fully agree that I need metadata somewhere. I meant that I don't have to define them anywhere because they are generated automatically. The point is I don't need to spend time on them.
    How are your joins generated automatically? Somewhere you need to store data about the primary key/foreign key relationships. The query itself can be generated but which fields to join on needs to be stored somewhere. You'll need to spend time defining those somewhere if your ORM can support joins. It will also need to know the kind of join: 1:1m 1:n. This is all metadata and needs to be configured to enable your queries to be generated.

    BTW, do you implement some form of caching? When you do a chain like $this->mapper->user->findById(123)->twitter[0]->content a few times then does your mapper fetch the data again from the DB?
    Yes. Any time an individual record is fetched, it's stored in the mapper. Individual mappers can turn this off but it's something I rarely need to use. Instead I have it so that certain fields (e.g. longblob) can be lazy loaded.

    I did play around with caching results from queries that returned multiple results but I found that in the real world, 99% of the time these are only used once whereas fetching by ID, in addition to being a lot less data to store in memory, is often something that ends up being used multiple times. If I want a product manufacturer name, it's likely I'll also want that product's manufacturer logo. So in that example, the user would be cached in memory for the duration of the script.


    Just wanted to point this out, as there is where you can gain significant performance improvements. If you ever deal with a SQL Server or MySQL Server or whatever database that is installed on a separate server or controlled/monitored by third parties, you can get significant performance savings by only needing to go to that server once. It is important to consider the fact that you will have network latency in your connections and fetching of results. Doing that repeatedly is going to add up in cost.

    Let me put that into a real life example. I worked on a project where the average page load time was 5-7 seconds (not terrible for the amount of data they were loading, but still slow). Ultimately, they were performing 600+ queries/fetches for data needed for the page. Changing those 600 queries (it was in a loop), down to 2 queries dropped the entire page load time to under 1 second. Their SQL Server was located on a separate server, so they did have network latency to consider, but they never realized that has their product grew, the loop was going to run more and more queries.

    The problem I see, is that some ORMs could have this affect, if not closely watched. Especially with remote sources. I'm not saying ORMs are bad, just do your due diligence and test them in a variety of situations before going all in on one.
    Of course 600+ queries is too much, at the same time and equally anecdotally I've fixed a very slow page by breaking it up into separate queries. It was a reporting query dealing with a hell of a lot of data and joining around 15 tables, sorting and grouping. In both our examples, the problem was the original developer neglecting to think about consequences. It was taking around 45 seconds to run.

    However, slow single queries are substantially worse than a page that is slow because it's executing too many queries. That reporting query that took 45+ seconds to run (actually that's generous, some of them were more than double that!) joined a table that stored questionnaire answers. Questionnaires were being filled in constantly (probably around every 5-10 seconds a write would happen to one of the answers table.) when this happens, because the SELECT was still running and the UPDATE/INSERT was issued, the answers table got locked and suddenly nobody across the entire site could even view a questionnaire or any kind of results page. All the result of someone viewing the results of a questionnaire from the past! Joyous! Once the reporting pages were changed to use separate queries, they still took a while to run (significantly less though! Roughly half the time) whether they were running or not, it never broke functionality across other parts of the site.

  25. #25
    From space with love silver trophy
    SpacePhoenix's Avatar
    Join Date
    May 2007
    Location
    Poole, UK
    Posts
    5,014
    Mentioned
    103 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by TomB View Post
    How many queries did you run? You should have run 1 query to fetch the products and one for the manufacturer of each product. The result being N+1 queries. Imagine you want to sort the initial product list. Add ORDER BY product.name to that or ORDER BY RAND() and your single query will be immensely slower because it's sorting the entire data set. Move the sort to array_shuffle($products) and it's suddenly much faster. This is scaled by the number of tables you are joining.



    But it depends what your query is doing. Once you start adding more joins, sorts and grouping (which you'll need if you using joins in this way) the single query method very quickly loses any performance benefit it had. Simple is always faster and once you start using prepared statements to fetch the manufacturers the difference will become even less in your very simplistic example.

    That's fair enough, but we recently had an interesting topic on premature optimisation and that's exactly what it sounds like to me. The problem is you're sacrificing flexibility in the name of performance. Fixing performance problems can easily be done after the fact with a host of different methods. Fixing flexibility problems generally involves re-writing a lot of code because the code isn't flexible enough to handle anything outside what it was initially designed to do. Once the client moves the goalposts and says



    How are your joins generated automatically? Somewhere you need to store data about the primary key/foreign key relationships. The query itself can be generated but which fields to join on needs to be stored somewhere. You'll need to spend time defining those somewhere if your ORM can support joins. It will also need to know the kind of join: 1:1m 1:n. This is all metadata and needs to be configured to enable your queries to be generated.



    Yes. Any time an individual record is fetched, it's stored in the mapper. Individual mappers can turn this off but it's something I rarely need to use. Instead I have it so that certain fields (e.g. longblob) can be lazy loaded.

    I did play around with caching results from queries that returned multiple results but I found that in the real world, 99% of the time these are only used once whereas fetching by ID, in addition to being a lot less data to store in memory, is often something that ends up being used multiple times. If I want a product manufacturer name, it's likely I'll also want that product's manufacturer logo. So in that example, the user would be cached in memory for the duration of the script.




    Of course 600+ queries is too much, at the same time and equally anecdotally I've fixed a very slow page by breaking it up into separate queries. It was a reporting query dealing with a hell of a lot of data and joining around 15 tables, sorting and grouping. In both our examples, the problem was the original developer neglecting to think about consequences. It was taking around 45 seconds to run.

    However, slow single queries are substantially worse than a page that is slow because it's executing too many queries. That reporting query that took 45+ seconds to run (actually that's generous, some of them were more than double that!) joined a table that stored questionnaire answers. Questionnaires were being filled in constantly (probably around every 5-10 seconds a write would happen to one of the answers table.) when this happens, because the SELECT was still running and the UPDATE/INSERT was issued, the answers table got locked and suddenly nobody across the entire site could even view a questionnaire or any kind of results page. All the result of someone viewing the results of a questionnaire from the past! Joyous! Once the reporting pages were changed to use separate queries, they still took a while to run (significantly less though! Roughly half the time) whether they were running or not, it never broke functionality across other parts of the site.
    How many queries did you run? You should have run 1 query to fetch the products and one for the manufacturer of each product. The result being N+1 queries. Imagine you want to sort the initial product list. Add ORDER BY product.name to that or ORDER BY RAND() and your single query will be immensely slower because it's sorting the entire data set. Move the sort to array_shuffle($products) and it's suddenly much faster. This is scaled by the number of tables you are joining.

    What DB engine was in use at the time? Assuming that it was a MySQL server my gut feeling is that it was a MyISAM table, the table(s) could be migrated over to InnoDB but you'd have to keep in mind is that versions of MySQL prior to 5.6.4 have no support for full text search and has only been available in any form from version 5.6.4 of MySQL (http://blogs.innodb.com/wp/2011/12/i...n-mysql-5-6-4/).

    In this thread various methods of getting the results from a large result set in a random order and the most efficient turned out to be the one where all the sorting was offloaded to the language being used to process the dataset (PHP).

    Code MySQL:
    SELECT p.*, m.*
        FROM products p
         LEFT JOIN manufacturer m
         ON m.id=p.id
              WHERE p.hidden=0
         ORDER BY p.price
         LIMIT 1000;

    You're selecting all the fields from both the tables, how many of them fields do you actually need? You should list the ones required only in the SELECT clause otherwise your transferring the data from x number of un-needed fields which you don't need, a waste of bandwidth, especially if MySQL and PHP aren't on the same server box.

    It would be interesting to hear @r937 ;'s take on this thread from a database POV
    Community Team Advisor
    Forum Guidelines: Posting FAQ Signatures FAQ Self Promotion FAQ
    Help the Mods: What's Fluff? Report Fluff/Spam to a Moderator


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •