I have table named “people_data” and “people_booked” here I want to get people who are not booked. I know two ways to do this:
SELECT a.`people_id` FROM `people_data` a WHERE (SELECT count(b.`people_id`) FROM `people_booked` b WHERE b.`people_id` = a.`people_id`) = 0 ORDER BY a.`index_order` LIMIT 1
SELECT a.`people_id` FROM `people_data` a WHERE a.`people_id` NOT IN (SELECT b.`people_id` FROM `people_booked`) ORDER BY a.`index_order` LIMIT 1
Which of these are better to use? also are there any more efficient ways to do this? Thanks!
SELECT people_data.people_id
FROM people_data
LEFT OUTER
JOIN people_booked
ON people_booked.people_id = people_data.people_id
WHERE people_booked.people_id IS NULL
SELECT people_data.people_id FROM people_data LEFT OUTER JOIN people_booked ON people_booked.people_id = people_data.people_id WHERE people_booked.people_id IS NULL LIMIT 1
Average: 0.0032 sec, 0.0424 sec, 0.0032 sec
SELECT a.people_id FROM people_data a WHERE (SELECT count(b.people_id) FROM people_booked b WHERE b.people_id = a.people_id) = 0 ORDER BY a.index_order LIMIT 1
Average: 0.0039 sec, 0.0037 sec, 0.0039 sec
SELECT a.people_id FROM people_data a WHERE a.people_id NOT IN (SELECT b.people_id FROM people_booked) ORDER BY a.index_order LIMIT 1
Average: 0.0041 sec, 0.0069 sec, 0.0041 sec
Ah, apparently the NOT IN method is the slowest. The JOIN works great on two occasions but spiked once, but still it is obvious that it has an advantage. The question is what exactly is the JOIN query here is doing. I can’t seem to read it or modify it without braking it.
For example what would I do to check people_booked.status=‘PAID’ along with that? Surely not adding an AND in the WHERE clause as I tried it. Aw ^_^’
Okay… gets all dizzy from reading that that kind of proves I’m not quite ready for that. I didn’t understand anything. :o BUT I’ll keep reading that up until I do though >.<’
Okay tried it with 32,144 records. I’m pretty sure I’d never have to go over 30k for this one. So the results were…
NOT IN Subquery: 0.0591 sec, 0.0585 sec, 0.0629 sec [JOIN Difference: +0.0099 sec, +0.0091 sec, +0.0127 sec]
Well its obvious now surely. Even so correct me if I’m wrong, I’m just trying to understand what the JOIN query is exactly doing here. I’ll probably understand better once the mtocker’s page begins to make sense. Until then is this what the query is doing?
SELECT people_data.people_id
FROM people_data
LEFT OUTER
JOIN people_booked
ON people_booked.people_id = people_data.people_id
WHERE people_booked.people_id IS NULL
It selects the field people_id from the left table people_data
It then joins the right able people_booked to the left table.
The join is based on condition where the people_id on both tables match. More test conditions can be added on the ON clause if needed.
currently the resulting visual resulting table should be each record from the left table people_data followed by columns of the right table people_booked where non-matching results are returned with NULL value
the WHERE clause picks out the ones where people_id on the joined people_booked table column with value NULL.
Now, I’ve tested each one out with a LIMIT 1. Each turns up with the same average in execution time though…
SELECT people_data.people_id FROM people_data LEFT OUTER JOIN people_booked ON people_booked.people_id = people_data.people_id WHERE people_booked.people_id IS NULL LIMIT 1
Average: 0.0005 sec, 0.0004 sec, 0.0004 sec
SELECT a.people_id FROM people_data a WHERE (SELECT count(b.people_id) FROM people_booked b WHERE b.people_id = a.people_id) = 0 ORDER BY a.index_order LIMIT 1
Average: 0.0005 sec, 0.0004 sec, 0.0004 sec
SELECT a.people_id FROM people_data a WHERE a.people_id NOT IN (SELECT b.people_id FROM people_booked) ORDER BY a.index_order LIMIT 1
Average: 0.0005 sec, 0.0004 sec, 0.0004 sec
All of them have the same speed it seems, except I can somewhat understand subqueries whereas I’m pretty much limited when it comes to joins (logic wise). Also my 2nd one i.e. the third one shown here is the shortest to read. Why is using a JOIN better in this situation? is there an advantage if the table becomes quite populated?