Group By + Order By without a temp table?

I have a query simplified as:

SELECT prod.id AS product_id
     , prod.name AS product_name
     , prod.price AS price
     , prod.thumbnail AS product_thumbnail
     , prod.active AS product_active
  FROM Products prod
LEFT OUTER
  JOIN CollectionsXProducts collxprod
    ON collxprod.prod_id = prod.id
LEFT OUTER
  JOIN Collections coll
    ON coll.coll_id = collxprod.coll_id
   AND coll.active = 1
 WHERE prod.active = 1
   AND coll.designer_id = 4
 GROUP
    BY prod.id
 ORDER
    BY prod.sku
 LIMIT 0, 20

This particular query uses both GROUP BY and ORDER BY clauses. The GROUP BY is there because a product could possibly be assigned to multiple collections. In cases like that, I only want to ensure that the product only shows up once. What needs to be done to make sure that this query stays optimized? Explain shows:

id  	 select_type  	 table  	 type  	 possible_keys  	 key  	 key_len  	 ref  	 rows  	 Extra
1 	SIMPLE 	prod 	ref 	active 	active 	2 	const 	1142 	Using where; Using temporary; Using filesort
1 	SIMPLE 	collxprod 	ref 	prod_id 	prod_id 	4 	mudev.prod.id 	2 	
1 	SIMPLE 	coll 	eq_ref 	PRIMARY,active 	PRIMARY 	1 	mudev.collxprod.coll_id 	1 	Using where

(notice “Using where; Using temporary; Using filesort” on table prod if it comes across garbled).

Even though prod.sku and prod.id both having unique indexes (prod.id is surrogate, prod.sku can change), the groupby/orderby combo causes a temporary table. Is there a way around this? How can I further optimize this query? It’s used on practically every page on our website :frowning:

Thanks

how many prod.ids per prod.sku? how many prod.skus per prod.id?

ids and skus have a 1:1 relationship. The only difference between them is SKUs can change (although very rarely). ids will never change.

EDIT:
SHOW CREATE TABLE for Products

CREATE TABLE `Products` (
`id` int(11) default NULL,
`sku` varchar(50) default NULL,
`name` varchar(100) default NULL,
`thumbnail` varchar(254) default NULL,
`image` varchar(254) default NULL,
`price` decimal(10,2) default NULL,
`cost` decimal(10,2) default NULL,
`descrip` mediumtext,
`weight` decimal(10,2) default NULL,
`taxable` tinyint(1) default NULL,
`active` tinyint(1) default NULL,
`date_added` timestamp NULL default CURRENT_TIMESTAMP,
UNIQUE KEY `s01_Products_1` (`id`),
KEY `sku` (`sku`),
KEY `date_added` (`date_added`),
KEY `active` (`active`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 PACK_KEYS=1;

(not my schema, so I cannot change much about it

okay, two suggested changes for your query

since you require

AND coll.designer_id = 4

therefore they should be INNER JOINs, not LEFT OUTER JOINs

and since sku and id are 1:1, drop the ORDER BY clause

now try your EXPLAIN again

:slight_smile:

  1. Left outer is used because this is a dynamic query (generated in PHP). The designer_id=4 is optional, and not always there. If a designer is not selected, all of the products are shown (20 at a time). The application layer handles products without collections differently than those with. In this particular query, yes, an inner join would help, but that would greatly change how the query is generated in PHP.

  2. ORDER BY prod.sku is important because we want consistent ordering of products. Also, the sku always starts with a vendor code which helps in the display of the products. Really, it should be ORDER BY designer_code, sku, but since the designer_code is at the start of the sku, I drop it.

Making those changes though (despite a different return) only changes the EXPLAIN slightly:

id  	 select_type  	 table  	 type  	 possible_keys  	 key  	 key_len  	 ref  	 rows  	 Extra
1 	SIMPLE 	prod 	ref 	id,active 	active 	2 	const 	1790 	Using where; Using temporary; Using filesort
1 	SIMPLE 	collxprod 	eq_ref 	prod_id,coll_id 	prod_id 	4 	mudev.prod.id 	1 	Using where
1 	SIMPLE 	coll 	eq_ref 	PRIMARY,active 	PRIMARY 	4 	mudev.collxprod.coll_id 	1 	Using where

BUMP Any takers?

OOOF what was the question?

Is there anything I can do to further optimize this query? It creates a temp table and my host told me that it is causing the majority of the load on our server.

The INNER JOIN did not relieve the problem, and wouldn’t work in the app since the WHERE coll.designer_id clause is a coincidence and not always there.

pretty hard for us to offer assistance with those criteria :slight_smile:

Here’s a more common usage I guess:

SELECT prod.id AS product_id
     , prod.sku AS product_sku
     , prod.name AS product_name
     , prod.price AS price
     , prod.thumbnail AS product_thumbnail
     , prod.active AS product_active
     , coll.coll_id
     , coll.name
  FROM Products prod
LEFT OUTER
  JOIN CollectionsXProducts collxprod
    ON collxprod.prod_id = prod.id
LEFT OUTER
  JOIN Collections coll
    ON coll.coll_id = collxprod.coll_id
   AND coll.active = 1
 WHERE prod.active = 1
 GROUP
    BY prod.id
 ORDER
    BY prod.sku
 LIMIT 0, 20

Even when collections aren’t used in the WHERE clause, I want to know which collection a product belongs to as it may change how the product is displayed.

since prod.sku and prod_id are 1:1, if you wrote GROUP BY prod.sku then you should get the same results, no?

this would avoid the temp table for the ORDER BY, and in fact, mysql has a (non-standard) proprietary extension to SQL where you can omit the ORDER BY and get results sequenced by the GROUP BY column

Hmm, replacing id with sku didn’t seem to help:

SELECT prod.id AS product_id
     , prod.sku AS product_sku
     , prod.name AS product_name
     , prod.price AS price
     , prod.thumbnail AS product_thumbnail
     , prod.active AS product_active
     , coll.coll_id
     , coll.name
  FROM Products prod
LEFT OUTER
  JOIN CollectionsXProducts collxprod
    ON collxprod.prod_id = prod.id
LEFT OUTER
  JOIN Collections coll
    ON coll.coll_id = collxprod.coll_id
   AND coll.active = 1
 WHERE prod.active = 1
 GROUP
    BY prod.sku
 ORDER
    BY prod.sku
 LIMIT 0, 20
id  	select_type  	table  		type  	possible_keys  	key  		key_len  	ref  	rows  	Extra
1 	SIMPLE 		prod 		ref 	active 		active 		2 		const 	1802 	Using where; Using temporary; Using filesort
1 	SIMPLE 		collxprod 	eq_ref 	prod_id 	prod_id 	4 		testing_sitepoint.prod.id 	1 	
1 	SIMPLE 		coll 		eq_ref 	PRIMARY,active 	PRIMARY 	4 		testing_sitepoint.collxprod.coll_id 	1 	 

Removing the ORDER BY does not change the query (good) or the EXPLAIN :frowning:

what percentage of your products are active?

what happens (both logically and in the EXPLAIN) if you omit WHERE prod.active = 1

Active is a field that means that the product is in stock (more specifically, means that it is available in one or more size/color combinations, but this query does not care about what combinations are available, just if any are).

Currently 42.7% of the products are active (= 1). I’m actually kinda surprised it is so low.

Removing the prod.active = 1 changes the EXPLAIN to be:

id  	select_type  	table  		type  	possible_keys  	key  		key_len  	ref  	rows  	Extra
1  	SIMPLE  	prod  		index  	NULL  		code  		51  		NULL  	3943  	
1 	SIMPLE 		collxprod 	eq_ref 	prod_id 	prod_id 	4 		testing_sitepoint.prod.id 	1 	
1 	SIMPLE 		coll 		eq_ref 	PRIMARY,active 	PRIMARY 	4 		testing_sitepoint.collxprod.coll_id 	1

Now how can I get these results without breaking the query :smiley:

well, here’s the deal, now that your query is optimized

you only wanted 20 rows, right? so pull in, say, 60 of them, and ignore the inactive ones

:cool:

Haha, so what you’re saying is my active field doesn’t have enough variance to be of any use so I can’t optimize my query anymore than it is?

LIMIT 0,60 might work for this, but when I want to go on to page two of the results, it gets much more complicated

Well if you change what you are after each time the prior request has been properly optimised then you are not really getting any closer to a solution. Why not specify exactly what it is you are trying to do fully rather than just part of it and then it will be possible to look at how to optimise that rather than just the part you told us about.

stephen, actually, i don’t think there’s anything missing from the problem description

brandon, what indexes are on the table currently?

try a composite index on (prod_sku,active) instead of just the active column

trumpet sounds

SELECT prod.id AS product_id
     , prod.sku AS product_sku
     , prod.name AS product_name
     , prod.price AS product_price
     , prod.thumbnail AS product_thumbnail
     , prod.active AS product_active
     , coll.coll_id AS collection_id
     , coll.name AS collection_name
  FROM Products prod
LEFT OUTER
  JOIN CollectionsXProducts collxprod
    ON collxprod.prod_id = prod.id
LEFT OUTER
  JOIN Collections coll
    ON coll.coll_id = collxprod.coll_id
   AND coll.active = 1
 WHERE prod.active = 1
 GROUP
    BY prod.sku
 ORDER
    BY prod.sku
 LIMIT 0, 20
+----+-------------+-----------+--------+----------------+------------+---------+-------------------------------------+------+-------------+
| id | select_type | table     | type   | possible_keys  | key        | key_len | ref                                 | rows | Extra       |
+----+-------------+-----------+--------+----------------+------------+---------+-------------------------------------+------+-------------+
|  1 | SIMPLE      | prod      | index  | NULL           | sku+active |      53 | NULL                                | 3943 | Using where |
|  1 | SIMPLE      | collxprod | eq_ref | prod_id        | prod_id    |       4 | testing_sitepoint.prod.id           |    1 |             |
|  1 | SIMPLE      | coll      | eq_ref | PRIMARY,active | PRIMARY    |       4 | testing_sitepoint.collxprod.coll_id |    1 |             |
+----+-------------+-----------+--------+----------------+------------+---------+-------------------------------------+------+-------------+

You rock Rudy. I’m going to merge this change into our dev site later today and see how the change of indexes affects the other apps and let you know what I find.

good thing you have a dev site, it’s always dicey seeing how a change “affects the other apps” in production, eh

:cool: