I have a list of product SKUs in the following format:
Brand Code (hyphen) Model Number (hyphen) Variant
So for example:
NIKE-SHOEA-WHITE
etc.
I have a query that attempts to get 5 of the most recent Model Numbers for a given brand.
So here’s some sample data:
sku date_added
NIKE-SHOEA-WHITE 2009-08-23
NIKE-SHOEA-BLACK 2009-08-23
NIKE-SHOEA-GREEN 2010-09-23
NIKE-SHOEA-BLUE 2010-09-23
NIKE-SHOEB-WHITE 2010-09-23
NIKE-SHOEB-BLACK 2010-09-23
NIKE-SHOEB-RED 2010-09-23
NIKE-SHOEB-PURPLE 2010-09-23
Here’s my current query (overly simplified):
SELECT p.sku AS product_sku
FROM Products p
WHERE p.active = 1
GROUP
BY SUBSTRING_INDEX(p.sku, '-', 2)
ORDER
BY p.date_added DESC
LIMIT 5
My expected result would be:
NIKE-SHOEA-GREEN
NIKE-SHOEB-WHITE
My actual result is:
NIKE-SHOEA-BLACK
NIKE-SHOEB-BLACK
(notice how NIKE-SHOEA-BLACK is actually over a year older than NIKE-SHOEA-BLUE and NIKE-SHOEA-GREEN)
What appears to be happening is that the order by is sorting the results by date, and then the group by is applied. When the group by is applied (presumably for speed), each sub-group (meaning all NIKE-SHOEA) are then sorted by code (so since -BLACK comes first alphabetically, it is aways returned first regardless of the date_added field).
So in technically if I say the query returns the “5 most recent models”, the query is working, but what I am truly after is the “5 of the newest products, while showing no more than 1 of each model”.
I don’t want to select a bunch of products and filter the results via the application if I don’t have to. Can anyone think of a clever solution?