Our Most Common Join Continues To Be Slow

So I’ve posted about this query join a few times in the past and we’ve tweaked it a lot, but it still just isn’t as fast as I need it to be. This join is used almost every time we select products on our ecommerce site, so it is important that it is fast.

What this join does is simply select the designer of the product. I’ve done some tweaks to try to improve it and I can shave a good amount of time off, but it really screws up some application logic elsewhere.

Here’s the query as it stands (in an overly simple version):

SELECT *
  FROM s01_Products products
LEFT OUTER
  JOIN s01_MUS_DesignerLookup mus_designerlookup
    ON mus_designerlookup.id = products.designer_id
    OR (
            products.designer_id IS NULL
        AND mus_designerlookup.code = LEFT(products.code, 2)
       )
   AND mus_designerlookup.active = 1
 WHERE products.active = 1

The way product codes (or SKUs) are set up is the first two characters signal which designer the product belongs to (AB-WIDGET would belong to ABC Company, where VA-WIDGET belongs to Vandelay Industries). This can be overridden for some products that don’t fit this form by manually assigning the product to a designer (stored in s01_Products.designer_id). Because of this (we have about 40 / 3000 using this) I have to use a conditional join.

I can tweak the query join to remove the conditional join and the query is 50% faster! The problem, is it changes application logic and our current workflow. Since the LEFT match won’t work on its own, the only option we have for tweaking is manually defining the relationship for every product in our store – which sucks!

So before I figure out how to go that route, can anyone help me with this query? Here’s the EXPLAIN:

id   select_type   table                type    possible_keys   key           key_len   ref     rows    Extra
1    SIMPLE        products             ref     active+code     active+code   2         const   1048    Using where
1    SIMPLE        mus_designerlookup   ALL     PRIMARY,code    NULL          NULL      NULL    41       

I have an index on products.designer_id but it doesn’t seem to help any since 99% of the fields are NULL

Unfortunately, mus_designerlookup.code = LEFT(products.code, 2) could match the wrong designer if the product’s code doesn’t obey the proper naming. That’s why it is important to only use that match if products.designer_id is null. I would love to go through and clean up the non-compliant product codes, but we have one designer that would be impossible to clean up right now.

Off Topic:

Backstory, we actually use to do a three conditional join because this one designer actually uses a 4 character designer code. But we converted that set of products to a manual relationship for the sake of performance.

I think that if I were work to remove one of the conditionals on the join, it seems as if manually defining each relationship would be better, but I just can’t figure out how to get that automated in our product creation workflow right now.

are the two data sets mutually exclusive?

i mean, if a given row would never have mus_designerlookup.id = products.designer_id as well as mus_designerlookup.code = LEFT(products.code, 2) AND mus_designerlookup.active = 1 at the same time, then you can use UNION ALL instead of UNION and skip the (expensive) sort for dupes

I ran that query on my dataset and it appears to be a lot slower (0.0043 vs 0.1087)

take the OR in the WHERE clause, and split it off into a separate query that you UNION together

SELECT *
  FROM s01_Products products
LEFT OUTER
  JOIN s01_MUS_DesignerLookup mus_designerlookup
    ON mus_designerlookup.id = products.designer_id
 WHERE products.active = 1

UNION 

SELECT *
  FROM s01_Products products
LEFT OUTER
  JOIN s01_MUS_DesignerLookup mus_designerlookup
    ON mus_designerlookup.code = LEFT(products.code, 2)
   AND mus_designerlookup.active = 1
 WHERE products.designer_id IS NULL
   AND products.active = 1