I’m writing a query where I’m pulling total sales by a customer and in addition to the total sales, I’d like to pull the date of the customer’s first purchase and the data of the customer’s last purchase.
Tables:
users
orders
order_items
The current query:
SELECT users.name AS NAME
, SUM(oi.quantity * oi.price) AS TOTAL
, [FIRST ORDER DATE]
, [LAST ORDER DATE]
FROM order_items oi
LEFT JOIN orders o ON oi.order_id = o.id
LEFT JOIN users u ON o.user_id = u.id
WHERE o.order_date >= DATE_ADD( NOW(), INTERVAL -6 MONTH)
AND o.order_date <= NOW()
How do I pull the first order date and last order date of a customer. Note - the query is pulling sales from the last 6 months only, however the customer’s first purchase may have been years ago. I think I need to use a subquery but I can’t figure out the appropriate way to do this.
I figured out one way to do this, but it seems inefficient. It took 106 seconds to run on my data. Is there a better way to do this?
SELECT users.name AS NAME
, SUM(oi.quantity * oi.price) AS TOTAL
, min.min_order_date AS MINDATE
FROM order_items oi
LEFT JOIN orders o ON oi.order_id = o.id
LEFT JOIN users u ON o.user_id = u.id
LEFT JOIN (
SELECT users.id AS u_id, MIN(orders.order_date) FROM users JOIN orders ON users.id = orders.user_id
) AS min ON min.u_id = u.id
WHERE o.order_date >= DATE_ADD( NOW(), INTERVAL -6 MONTH)
AND o.order_date <= NOW()
r937, whoops, I left off the GROUP BY when I pasted it into the forum code, sorry. I read your reply and thought, “WTF my query runs how is it invalid lol” Apparently I cannot edit my other posts but rest assured my query had the GROUP BY before I pasted it here!!
Any comments on if there’s a more efficient way of doing the query in my 2nd post?
SELECT u.name AS NAME
, SUM(oi.quantity * oi.price) AS TOTAL
, min.min_order_date AS MINDATE
FROM order_items oi
LEFT JOIN orders o ON oi.order_id = o.id
LEFT JOIN users u ON o.user_id = u.id
LEFT JOIN (
SELECT users.id AS u_id, MIN(orders.order_date) FROM users JOIN orders ON users.id = orders.user_id
) AS min ON min.u_id = u.id
WHERE o.order_date >= DATE_ADD( NOW(), INTERVAL -6 MONTH)
AND o.order_date <= NOW()
GROUP BY u.name
If you put WHERE conditions on a left joined table (different from IS [NOT] NULL) then the join effectively becomes an INNER JOIN.
And I really don’t think you want all order items regardless if they have relative rows in the orders table?
I think the table to start with here is users (at least from a logical point of view).
And are there users without orders in the last 6 months? And do you want to extract them?
Let’s assume there are, and let’s assume you want them. The query would be something like
SELECT
u.name AS NAME
, SUM(oi.quantity * oi.price) AS TOTAL
, min.min_order_date AS MINDATE
FROM users AS u
LEFT OUTER JOIN orders
ON o.user_id = u.id
AND o.order_date >= DATE_ADD( NOW(), INTERVAL -6 MONTH)
AND o.order_date <= NOW()
LEFT OUTER JOIN order_items oi
ON oi.order_id = o.id
LEFT OUTER JOIN (
SELECT
user_id
, MIN(order_date) AS min_order_date
FROM orders
GROUP BY user_id
) AS min
ON min.user_id = u.id
GROUP BY u.name, min.min_order_date
Didn’t test it, so there might be some errors in it.
And as you can see, there’s no need for a join in the subquery, the min date for each userid can be taken from the orders table.
guido2004, thanks for the response. Your query works and it took about 50 seconds to run. I actually do not want to select ALL customers at this time, only customers who have ordered in the past 6 months, however this still is very useful, so thanks again!
I have another question that relates to this same query, but again I’m confused on. So I’m summing the sales of the customer here, but I’d also like to subtract returns from the SUM. Returns are in a separate table and I would only want to select returns that appeared in the date range, not returns that associated with an order of the date range. Does that make sense? Example - I’m selecting orders from 6/1/2011 to 12/1/2011 currently. Now I want to include any returns that were processed between 6/1/2011 and 12/1/2011. The order of a return processed during that range may not have originally been placed between 6/1/2011 and 12/1/2011.
I’m confused on how’d you’d include this into one single query.
Here’s some table definitions if it helps? If anyone can help, just assume I want to select ALL customers - using the query that guido2004 posted above
orders
-----
id
user_id
order_date
order_items
----------
id
order_id
quantity
price
returns
------
id
order_id
total_amount
return_date
return_line_items
---------------
id
return_id
order_id
order_items_id
quantity
price
That’s still a lot. Do you have indexes on all columns used in the joins?
Then you can INNER JOIN the orders table, instead of LEFT JOINing.
To prevent duplication of rows (a user can have multiple orders AND multiple returns), you’ll have to use two more subqueries in this query (one that will return the sales total, and one the returns total per user).
Something like:
SELECT
u.name AS NAME
, ot.ORDERTOTAL
, rt.RETURNTOTAL
, min.min_order_date AS MINDATE
FROM users AS u
INNER JOIN (
SELECT
o.user_id
, SUM(oi.quantity * oi.price) AS ORDERTOTAL
FROM orders
LEFT OUTER JOIN order_items oi
ON oi.order_id = o.id
WHERE o.order_date >= DATE_ADD( NOW(), INTERVAL -6 MONTH)
AND o.order_date <= NOW()
GROUP BY o.user_id
) as ot
ON ot.user_id = u.id
LEFT OUTER JOIN (
SELECT
o.user_id
, SUM(total_amount) AS RETURNTOTAL
FROM returns r
INNER JOIN orders o
ON r.order_id = o.id
WHERE r.return_date >= DATE_ADD( NOW(), INTERVAL -6 MONTH)
AND r.return_date <= NOW()
GROUP BY o.user_id
) as rt
ON rt.user_id = u.id
LEFT OUTER JOIN (
SELECT
user_id
, MIN(order_date) AS min_order_date
FROM orders
GROUP BY user_id
) AS min
ON min.user_id = u.id