Hi,
I wrote the following subquery to return the names of people that have a phone number in common with someone else:
SELECT Name
FROM People
WHERE Phone IN (SELECT Phone
FROM People
GROUP BY Phone
HAVING COUNT(*) > 1)
The table has about 100K rows and this takes about 20 seconds which does not make sense because running the subquery by itself takes only about 0.20 secs and the main query about that much if I enter the result of the subquesry manually (10 rows returned). So I would have expected this to run in 0.40 seconds and not 20!
So MySQL must be doing something wrong here, interpreting this the wrong way or something or somehow running the subquery multiple times? Though I can’t see why.
Playing around with this a little I then (almost accidentally) found that if I rewrite the above as:
SELECT Name
FROM People
WHERE Phone IN (SELECT Phone FROM (SELECT Phone
FROM People
GROUP BY Phone
HAVING COUNT(*) > 1) InnerTable)
Does indeed run in only 0.40 secs!!! Now I can’t figure out why these two forms should behave so differently. I can’t understand why MySQL would interpret the first one in any other way than the obvious and even more puzzling (to me) that the second form fixes the problem.
I am happy to use the second form but I would also like to understand why this difference. The only other problem is that the second (and faster) form is not allowed to be written as a View, it complains that the View’s SELECT contains a subquery in the FROM clause.
Here are the explain plans for the two queries (I left out the columns that were NULL for all rows):
First Form (SLOW):
+----+--------------------+--------+------+--------+----------------------------------------------+
| id | select_type | table | type | rows | Extra |
+----+--------------------+------- +------+--------+----------------------------------------------+
| 1 | PRIMARY | People | ALL | 108423 | Using where |
| 2 | DEPENDENT SUBQUERY | People | ALL | 108423 | Using where; Using temporary; Using filesort |
+----+--------------------+--------+------+--------+----------------------------------------------+
Second Form (FAST):
+----+--------------------+------------+------+--------+----------------------------------------------+
| id | select_type | table | type | rows | Extra |
+----+--------------------+------------+------+--------+----------------------------------------------+
| 1 | PRIMARY | People | ALL | 108431 | Using where |
| 2 | DEPENDENT SUBQUERY | <derived3> | ALL | 3 | Using where |
| 3 | DERIVED | People | ALL | 108431 | Using where; Using temporary; Using filesort |
+----+--------------------+------------+------+--------+----------------------------------------------+
Note that there is also a time difference to come up with each plan, the first one (SLOW) takes no time (0.00 sec) while the second one takes 0.20 sec, not sure if this matters or is in any way significant.
Also welcome are any suggestions for a better way to write this query.
Thanks
John P.