Get data of someone that doesnt exist in another table

Hi there,

I have table named “people_data” and “people_booked” here I want to get people who are not booked. I know two ways to do this:


SELECT a.`people_id` FROM `people_data` a WHERE (SELECT count(b.`people_id`) FROM `people_booked` b WHERE b.`people_id` = a.`people_id`) = 0 ORDER BY a.`index_order` LIMIT 1
SELECT a.`people_id` FROM `people_data` a WHERE a.`people_id` NOT IN (SELECT b.`people_id` FROM `people_booked`) ORDER BY a.`index_order` LIMIT 1

Which of these are better to use? also are there any more efficient ways to do this? Thanks! :slight_smile:

No, indeed. Put it in the ON clause instead :slight_smile:

I’ll try populating the tables with test data, guess a few hundreds will do. Anyway… how do I flush the mysql buffers again? ^_^‘’

SELECT people_data.people_id 
  FROM people_data
LEFT OUTER
  JOIN people_booked 
    ON people_booked.people_id = people_data.people_id
 WHERE people_booked.people_id IS NULL

Okay a test with 2064 records…

SELECT people_data.people_id FROM people_data LEFT OUTER JOIN people_booked ON people_booked.people_id = people_data.people_id WHERE people_booked.people_id IS NULL LIMIT 1

Average: 0.0032 sec, 0.0424 sec, 0.0032 sec

SELECT a.people_id FROM people_data a WHERE (SELECT count(b.people_id) FROM people_booked b WHERE b.people_id = a.people_id) = 0 ORDER BY a.index_order LIMIT 1

Average: 0.0039 sec, 0.0037 sec, 0.0039 sec

SELECT a.people_id FROM people_data a WHERE a.people_id NOT IN (SELECT b.people_id FROM people_booked) ORDER BY a.index_order LIMIT 1

Average: 0.0041 sec, 0.0069 sec, 0.0041 sec

Ah, apparently the NOT IN method is the slowest. The JOIN works great on two occasions but spiked once, but still it is obvious that it has an advantage. The question is what exactly is the JOIN query here is doing. I can’t seem to read it or modify it without braking it.

For example what would I do to check people_booked.status=‘PAID’ along with that? Surely not adding an AND in the WHERE clause as I tried it. Aw ^_^’

yes that’s right :slight_smile:

Okay… gets all dizzy from reading that that kind of proves I’m not quite ready for that. I didn’t understand anything. :o BUT I’ll keep reading that up until I do though >.<’

Okay tried it with 32,144 records. I’m pretty sure I’d never have to go over 30k for this one. So the results were…

  1. JOIN: 0.0492 sec, 0.0494 sec, 0.0502 sec

  2. Subquery Count Test: 0.0572 sec / 0.0569 sec / 0.0561 sec [JOIN Difference: +0.008 sec, +0.0075 sec, +0.0059 sec]

  3. NOT IN Subquery: 0.0591 sec, 0.0585 sec, 0.0629 sec [JOIN Difference: +0.0099 sec, +0.0091 sec, +0.0127 sec]

Well its obvious now surely. Even so correct me if I’m wrong, I’m just trying to understand what the JOIN query is exactly doing here. I’ll probably understand better once the mtocker’s page begins to make sense. Until then is this what the query is doing?

SELECT people_data.people_id 
  FROM people_data
LEFT OUTER
  JOIN people_booked 
    ON people_booked.people_id = people_data.people_id
 WHERE people_booked.people_id IS NULL
  1. It selects the field people_id from the left table people_data

  2. It then joins the right able people_booked to the left table.

  3. The join is based on condition where the people_id on both tables match. More test conditions can be added on the ON clause if needed.

  4. currently the resulting visual resulting table should be each record from the left table people_data followed by columns of the right table people_booked where non-matching results are returned with NULL value

  5. the WHERE clause picks out the ones where people_id on the joined people_booked table column with value NULL.

Is that right?

that’s still a puny table – try it with a substantially larger one

and in the meantime, check this – http://mtocker.livejournal.com/52399.html

it’s using an INNER JOIN versus an IN subquery, whereas you’re using a LEFT OUTER JOIN versus a NOT IN subquery

That works too, thanks~

Now, I’ve tested each one out with a LIMIT 1. Each turns up with the same average in execution time though…

SELECT people_data.people_id FROM people_data LEFT OUTER JOIN people_booked ON people_booked.people_id = people_data.people_id WHERE people_booked.people_id IS NULL LIMIT 1

Average: 0.0005 sec, 0.0004 sec, 0.0004 sec

SELECT a.people_id FROM people_data a WHERE (SELECT count(b.people_id) FROM people_booked b WHERE b.people_id = a.people_id) = 0 ORDER BY a.index_order LIMIT 1

Average: 0.0005 sec, 0.0004 sec, 0.0004 sec

SELECT a.people_id FROM people_data a WHERE a.people_id NOT IN (SELECT b.people_id FROM people_booked) ORDER BY a.index_order LIMIT 1

Average: 0.0005 sec, 0.0004 sec, 0.0004 sec

All of them have the same speed it seems, except I can somewhat understand subqueries whereas I’m pretty much limited when it comes to joins (logic wise). Also my 2nd one i.e. the third one shown here is the shortest to read. Why is using a JOIN better in this situation? is there an advantage if the table becomes quite populated?

Thanks again :slight_smile:

True. BUT I’m on a shared host with phpmyadmin. :blush: The only thing I can do at best is optimize. >.<

Okay going to try with 2064 records (previously had only three).

that’s a good question

why don’t you come back later and tell us what your timings reveal after the tables get a few hundred thousand rows

oh, and don’t forget to flush mysql’s buffers in between tests :slight_smile:

My mistake, edited. There is no account_id, all are people_id :lol:

neither of them is very good

why are they built on different relationships? one is on people_id, the other is on account_id

one way is to stop and restart the mysql server :slight_smile: