SELECT prod.id AS product_id
, prod.name AS product_name
, prod.price AS price
, prod.thumbnail AS product_thumbnail
, prod.active AS product_active
FROM Products prod
LEFT OUTER
JOIN CollectionsXProducts collxprod
ON collxprod.prod_id = prod.id
LEFT OUTER
JOIN Collections coll
ON coll.coll_id = collxprod.coll_id
AND coll.active = 1
WHERE prod.active = 1
AND coll.designer_id = 4
GROUP
BY prod.id
ORDER
BY prod.sku
LIMIT 0, 20
This particular query uses both GROUP BY and ORDER BY clauses. The GROUP BY is there because a product could possibly be assigned to multiple collections. In cases like that, I only want to ensure that the product only shows up once. What needs to be done to make sure that this query stays optimized? Explain shows:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE prod ref active active 2 const 1142 Using where; Using temporary; Using filesort
1 SIMPLE collxprod ref prod_id prod_id 4 mudev.prod.id 2
1 SIMPLE coll eq_ref PRIMARY,active PRIMARY 1 mudev.collxprod.coll_id 1 Using where
(notice “Using where; Using temporary; Using filesort” on table prod if it comes across garbled).
Even though prod.sku and prod.id both having unique indexes (prod.id is surrogate, prod.sku can change), the groupby/orderby combo causes a temporary table. Is there a way around this? How can I further optimize this query? It’s used on practically every page on our website
Left outer is used because this is a dynamic query (generated in PHP). The designer_id=4 is optional, and not always there. If a designer is not selected, all of the products are shown (20 at a time). The application layer handles products without collections differently than those with. In this particular query, yes, an inner join would help, but that would greatly change how the query is generated in PHP.
ORDER BY prod.sku is important because we want consistent ordering of products. Also, the sku always starts with a vendor code which helps in the display of the products. Really, it should be ORDER BY designer_code, sku, but since the designer_code is at the start of the sku, I drop it.
Making those changes though (despite a different return) only changes the EXPLAIN slightly:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE prod ref id,active active 2 const 1790 Using where; Using temporary; Using filesort
1 SIMPLE collxprod eq_ref prod_id,coll_id prod_id 4 mudev.prod.id 1 Using where
1 SIMPLE coll eq_ref PRIMARY,active PRIMARY 4 mudev.collxprod.coll_id 1 Using where
Is there anything I can do to further optimize this query? It creates a temp table and my host told me that it is causing the majority of the load on our server.
The INNER JOIN did not relieve the problem, and wouldn’t work in the app since the WHERE coll.designer_id clause is a coincidence and not always there.
SELECT prod.id AS product_id
, prod.sku AS product_sku
, prod.name AS product_name
, prod.price AS price
, prod.thumbnail AS product_thumbnail
, prod.active AS product_active
, coll.coll_id
, coll.name
FROM Products prod
LEFT OUTER
JOIN CollectionsXProducts collxprod
ON collxprod.prod_id = prod.id
LEFT OUTER
JOIN Collections coll
ON coll.coll_id = collxprod.coll_id
AND coll.active = 1
WHERE prod.active = 1
GROUP
BY prod.id
ORDER
BY prod.sku
LIMIT 0, 20
Even when collections aren’t used in the WHERE clause, I want to know which collection a product belongs to as it may change how the product is displayed.
since prod.sku and prod_id are 1:1, if you wrote GROUP BY prod.sku then you should get the same results, no?
this would avoid the temp table for the ORDER BY, and in fact, mysql has a (non-standard) proprietary extension to SQL where you can omit the ORDER BY and get results sequenced by the GROUP BY column
SELECT prod.id AS product_id
, prod.sku AS product_sku
, prod.name AS product_name
, prod.price AS price
, prod.thumbnail AS product_thumbnail
, prod.active AS product_active
, coll.coll_id
, coll.name
FROM Products prod
LEFT OUTER
JOIN CollectionsXProducts collxprod
ON collxprod.prod_id = prod.id
LEFT OUTER
JOIN Collections coll
ON coll.coll_id = collxprod.coll_id
AND coll.active = 1
WHERE prod.active = 1
GROUP
BY prod.sku
ORDER
BY prod.sku
LIMIT 0, 20
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE prod ref active active 2 const 1802 Using where; Using temporary; Using filesort
1 SIMPLE collxprod eq_ref prod_id prod_id 4 testing_sitepoint.prod.id 1
1 SIMPLE coll eq_ref PRIMARY,active PRIMARY 4 testing_sitepoint.collxprod.coll_id 1
Removing the ORDER BY does not change the query (good) or the EXPLAIN
Active is a field that means that the product is in stock (more specifically, means that it is available in one or more size/color combinations, but this query does not care about what combinations are available, just if any are).
Currently 42.7% of the products are active (= 1). I’m actually kinda surprised it is so low.
Removing the prod.active = 1 changes the EXPLAIN to be:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE prod index NULL code 51 NULL 3943
1 SIMPLE collxprod eq_ref prod_id prod_id 4 testing_sitepoint.prod.id 1
1 SIMPLE coll eq_ref PRIMARY,active PRIMARY 4 testing_sitepoint.collxprod.coll_id 1
Now how can I get these results without breaking the query
Well if you change what you are after each time the prior request has been properly optimised then you are not really getting any closer to a solution. Why not specify exactly what it is you are trying to do fully rather than just part of it and then it will be possible to look at how to optimise that rather than just the part you told us about.
SELECT prod.id AS product_id
, prod.sku AS product_sku
, prod.name AS product_name
, prod.price AS product_price
, prod.thumbnail AS product_thumbnail
, prod.active AS product_active
, coll.coll_id AS collection_id
, coll.name AS collection_name
FROM Products prod
LEFT OUTER
JOIN CollectionsXProducts collxprod
ON collxprod.prod_id = prod.id
LEFT OUTER
JOIN Collections coll
ON coll.coll_id = collxprod.coll_id
AND coll.active = 1
WHERE prod.active = 1
GROUP
BY prod.sku
ORDER
BY prod.sku
LIMIT 0, 20
You rock Rudy. I’m going to merge this change into our dev site later today and see how the change of indexes affects the other apps and let you know what I find.