SitePoint Sponsor

User Tag List

Results 1 to 7 of 7
  1. #1
    SitePoint Evangelist hessodreamy's Avatar
    Join Date
    Apr 2005
    Location
    uk
    Posts
    524
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)

    Trying to optimize large query

    I've got this large query that's bogging things down on my site and I think I've exhausted my optimization abilities on it!

    What I was trying to achieve was to fetch the products for a category, along with ALL of the information that is required for the on-page display, so as to get it all in one and avoid any iterative queries. One-to-many relationships were resolved by liberal use of GROUP_CONCAT. The goal was for overall speed but I'm not sure I've managed that!

    The query is below, with the EXPLAIN statement below that. Do you think there's any improvements I can make? Or am I trying to do too much in this one query? It takes about 4 seconds usually.

    All the tables are MyISAM (a couple of of them have FULLTEXT indexes). I've tried changing them to INNODB but didn't see any performance increase. In terms of table size the products table has about 100,000 rows, a similar amount for the images table. They're the main big ones.

    Not sure if the query and the explain are sufficient to allow understanding of what I'm doing. Let me know if I need to post more info.

    Code:
    SELECT SQL_CALC_FOUND_ROWS
    			p.productID, p.prodName, p.price, 
    			GROUP_CONCAT(DISTINCT pic.fileName ORDER BY pic.index) pictureList,
    			COUNT(DISTINCT pic.picID) pic_count,
    			GROUP_CONCAT(DISTINCT 'caption_',pic.index,':',pic.caption ORDER BY pic.index) captionList,
    			s.deliveryTime,
    			tax.taxRate,
    			COUNT(v.videoID) videoCount,
    			COUNT(DISTINCT review.reviewId) AS review_count, AVG(review.rating) AS avg_review,
    			
    			GROUP_CONCAT(DISTINCT 'pid:',freeProd.productID,';name:',freeProd.name,';price:',freeProd.price, ';image:',freeProductPic.fileName,';' SEPARATOR '#') freeProds, 	
    			GROUP_CONCAT(DISTINCT doc.docFile) docFiles
    			
    			FROM 
    			products AS p 
    			JOIN product_category AS pcat ON p.productID=pcat.productID
    			LEFT JOIN tax ON p.taxCode=tax.taxCode
    			LEFT JOIN productSupplier AS s ON p.supplierID=s.supplierID
    			LEFT JOIN images AS pic ON p.productID=pic.productID 
    			LEFT JOIN videos AS v ON p.productID=v.productID
    			
    			LEFT JOIN productReviews AS rev ON p.productID=rev.productID AND review.status=1
    
    			LEFT JOIN freeProduct AS fp ON p.productID = fp.productID
    			LEFT JOIN tProducts AS freeProd ON fp.freeProductID=freeProd.productID
    			LEFT JOIN images AS freeProductPic ON freeProd.productID=freeProductPic.productID AND freeProductPic.index=1
    
    			LEFT JOIN product_productDocuments AS pdoc ON p.productID=pdoc.productID
    			LEFT JOIN productDocuments AS doc ON pdoc.docID=doc.docID
    			
    			WHERE 
    				p.isForSale=1  
    				AND 
    				pcat.catId=6615
    			GROUP BY p.productID
    		
    			ORDER BY price ASC
    			LIMIT 0, 100
    Code:
    id  select_type         table      type    possible_keys                           key            key_len  ref                          rows  Extra                                         
    1   PRIMARY             pcat       ref     unique_index,catId,productID            unique_index   4        const                        35    Using index; Using temporary; Using filesort  
    1   PRIMARY             p          eq_ref  PRIMARY,isForSale,prodID_isForSale      PRIMARY        4        mydb.pcat.productID       1     Using where                                   
    1   PRIMARY             tax        eq_ref  PRIMARY                                 PRIMARY        7        mydb.p.taxCode            1                                                   
    1   PRIMARY             s          ref     supplierId                              supplierId     13       mydb.p.supplierID         1                                                   
    1   PRIMARY             pic        ref     productID,prodID_index                  productID      4        mydb.p.productID          1                                                   
    1   PRIMARY             v          ref     productID                               productID      5        mydb.p.productID          3                                                   
    1   PRIMARY             review     ref     status,productID,prodID_status          prodID_status  7        mydb.pcat.productID,const 5                                                   
    1   PRIMARY             fp         ref     unique_index,productID                  unique_index   8        mydb.p.productID          1                                                   
    1   PRIMARY             freeProd   eq_ref  PRIMARY,prodID_isForSale                PRIMARY        4        mydb.fp.freeProductID     1                                                   
    1   PRIMARY             freePic    ref     relProdId,prodID_index                  productID      4        mydb.freeProd.productID   1                                                   
    1   PRIMARY             pdoc       ref     unique_index,productID                  unique_index   9        mydb.p.productID          1     Using index                                   
    1   PRIMARY             doc        eq_ref  PRIMARY                                 PRIMARY        4        mydb.pdoc.docID           1

  2. #2
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,270
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    you realize, of course, why you have to use DISTINCT in all your GROUP_CONCATs, right?

    it's because you have massive cross join effects produced by the various one-to-many relationships compounding each other

    to see what i mean, just remove the GROUP BY clause (and of course take off the GROUP_CONCAT functions, but leave a column in the SELECT clause so that table is still represented) to see how many intermediate rows your joins produce

    then i'll explain how to change the query
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  3. #3
    SitePoint Evangelist hessodreamy's Avatar
    Join Date
    Apr 2005
    Location
    uk
    Posts
    524
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    Yup, there are 54436 rows in the intermediate set. The final result is 37 rows. That's a lot of cross-joining.

  4. #4
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,270
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by hessodreamy View Post
    Yup, there are 54436 rows in the intermediate set. The final result is 37 rows. That's a lot of cross-joining.
    and that's where the processing time goes, to GROUP_CONCAT all the distinct values, since each DISTINCT requires a sort

    okay, let's reconstruct the query

    i always start the FROM clause with the "driving" table, the table that drives the logic of the query, i.e. the table that has the important WHERE criterion

    in this case you want all products for a specific category, so product_category is the driving table
    Code:
      FROM product_category AS pcat
    INNER
      JOIN products AS p 
        ON p.productID = pcat.productID
       AND p.isForSale = 1
     WHERE pcat.catId = 6615
    note how the only condition left in the WHERE clause is the driving condition, the other one has been moved up to the join's ON clause

    now i start joining all the many-to-one tables using inner joins
    Code:
      FROM product_category AS pcat
    INNER
      JOIN products AS p 
        ON p.productID = pcat.productID
       AND p.isForSale = 1
    INNER
      JOIN tax 
        ON tax.taxCode = p.taxCode
     WHERE pcat.catId = 6615
    note it's an INNER JOIN because you wouldn't have a tax.taxcode for a tax.taxcode that doesn't exist (except if p.taxcode can be NULL, in which case, yes, it would have to be a LEFT OUTER JOIN)

    i'm going to skip a couple of tables and do the videos and reviews

    these are one-to-many relationships (as opposed to the many-to-one relationship we saw with the tax table) so these get "pushed down" into subqueries
    Code:
      FROM product_category AS pcat
    INNER
      JOIN products AS p 
        ON p.productID = pcat.productID
       AND p.isForSale = 1
    INNER
      JOIN tax 
        ON tax.taxCode = p.taxCode
    LEFT OUTER 
      JOIN ( SELECT productID
                  , COUNT(*) AS videoCount
               FROM videos 
             GROUP
                 BY productID ) AS v 
        ON v.productID = p.productID
    LEFT OUTER 
      JOIN ( SELECT productID
                  , COUNT(*) AS review_count
                  , AVG(rating) AS avg_review
               FROM productReviews
              WHERE status = 1
             GROUP
                 BY productID ) AS rev 
        ON rev.productID = p.productID
     WHERE pcat.catId = 6615
    so the strategy is to do the groupings in the subqueries

    the videos subquery was easy, but the reviews subquery was a bit confusing, as you used "rev" as the table alias but "review" elsewhere (this would've been a syntax error in the query you posted), but i wanted to show how more than one aggregate function is done, and also the use of the WHERE clause in the subquery

    make sense so far?

    so the final main query does not have a GROUP BY clause, all grouping is done in subqueries

    think you can finish the rest of the query?

    if you have trouble, post how far you got...
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  5. #5
    SitePoint Evangelist hessodreamy's Avatar
    Join Date
    Apr 2005
    Location
    uk
    Posts
    524
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    Thanks for your help. As always, I realise there's so much more to learn!

    I get what you're saying about keeping the GROUP BYs in the subqueries. However I have re-written half of the query and it's now running slower.

    Below is what I've written so far, and I've left out the remaining parts until I get my head around this bit. Is the problem that the subqueries are returning too large amounts of data, and matching it against the "driving" table in an un-indexed way?

    Code:
    SELECT SQL_CALC_FOUND_ROWS
    	p.productID, p.prodName, p.price, 
    	s.deliveryTime,
    	tax.taxRate,
    
    	pic.pic_count, pic.picture_list, pic.captionList			
    	, v.videoCount
    	, review.review_count, review.avg_review
    
    			
    	FROM 
    	product_category AS pcat
    	INNER JOIN products AS p ON p.productID=pcat.productID AND p.isForSale=1  
    	INNER JOIN tax ON p.taxCode=tax.taxCode
    	INNER JOIN productSupplier AS s ON p.supplierID=s.supplierID
    	
    	LEFT OUTER JOIN 
    	(
    		SELECT productID, 
    			COUNT(DISTINCT pic.picID) pic_count,
    			GROUP_CONCAT(DISTINCT pic.fileName ORDER BY pic.ordering) picture_list,
    			GROUP_CONCAT(DISTINCT 'caption_',pic.index,':',pic.caption ORDER BY pic.ordering) captionList
    		FROM images pic 
    		GROUP 
    			BY pic.relProdID
    		#THIS SUBQUERY RETURNS 84000 ROWS
    	) AS pic
    		ON pic.productID=p.productID
    		
    	LEFT OUTER JOIN 
    	(
    		SELECT vid.productID
    		, COUNT(*) AS videoCount
    		FROM videos vid
    		GROUP BY vid.productID
    		#THIS SUBQUERY RETURNS 400 ROWS
    	) AS v 
    		ON v.productID=p.productID
    		
    	LEFT OUTER JOIN
    	(
    		SELECT review.productID
    		, COUNT(DISTINCT review.reviewId) AS review_count
    		, AVG(review.rating) AS avg_review
    		FROM productReviews review 
    		WHERE review.status=1
    		GROUP BY review.productID
    		#THIS SUBQUERY RETURNS 2500 ROWS
    	) AS review 
    		ON review.productID=p.productID
    	
    	WHERE 
    		pcat.catId=6615
    	
    	ORDER BY price ASC
    Code:
    id  select_type  table       type    possible_keys                              key           key_len  ref                    rows   Extra                                         
    1   PRIMARY      pth         ref     unique_index,catId,productID               unique_index  4        const                  35     Using index; Using temporary; Using filesort  
    1   PRIMARY      p           eq_ref  PRIMARY,suppid,isForSake,prodID_isForSale  PRIMARY       4        mydb.pth.productID     1      Using where                                   
    1   PRIMARY      s           ref     supplierId                                 supplierId    13       mydb.p.supplierID      1      Using where; Using index                      
    1   PRIMARY      tax         eq_ref  PRIMARY                                    PRIMARY       7        mydb.p.taxCode         1      Using index                                   
    1   PRIMARY      <derived2>  ALL                                                                                              83498                                                
    1   PRIMARY      <derived3>  ALL                                                                                              372                                                  
    1   PRIMARY      <derived4>  ALL                                                                                              2243                                                 
    4   DERIVED      review      ALL     status                                                                                   14033  Using filesort                                
    3   DERIVED      vid         index                                              productID     5                               1166   Using index                                   
    2   DERIVED      pic         ALL                                                                                              89994  Using filesort

  6. #6
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,270
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    i'm sorry, that EXPLAIN is just a little bit beyond my skill set

    if you're really only after (approx) 37 products at a time, perhaps you could try breaking up the query and running each of the one-to-many relationship queries separately, using WHERE product_id IN ( list of ids ) where the list of product id values is obtained by the first query
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  7. #7
    SitePoint Evangelist hessodreamy's Avatar
    Join Date
    Apr 2005
    Location
    uk
    Posts
    524
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    I suspected that I'd have to re-do the process using separate queries, but was trying to avoid it! But it's probably the best way. There was a time I thought that 1 query would be the best way, but you live and learn.
    Thanks for your help, it's appreciated, as always.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •