Why is this query taking so long to execute?

T_Mac · November 19, 2013, 8:21am

Hi all,

I have a stored procedure that runs this query …

SELECT * FROM tbl_owners WHERE  OwnerPostcode IN (SELECT DISTINCT Postcode FROM tbl_postcodes WHERE TerritoryID = _TerritoryID) ORDER BY OwnerLastName;

So, I am passing it a variable (_TerritoryID) to use in a subquery to get a list of postcodes. The subquery returns a list of about 20 or so Distinct postcodes from the subquery. The tbl_owners table has about 12000 owners.

The query takes a minute or more to run.

If I supply a list of postcodes e.g. … WHERE OwnerPostcode in (“2144”,“2156”,“2142”) … the query is very fast.

This makes me think that maybe the query is running the subquery ever time. Is that the case? And if so, is there a better way to write this query so that it only does the subquery once?

TIA

gk53 · November 19, 2013, 1:06pm

Try

select o.*
FROM tbl_owners o
join (SELECT Postcode 
		FROM tbl_postcodes 
		WHERE TerritoryID = _TerritoryID
		GROUP BY Postcode) p
	on o.OwnerPostcode = P.Postcode
ORDER BY OwnerLastName;

this use different execution plan and should be faster

vinny42 · November 19, 2013, 7:36pm

The query takes a minute or more to run.

If I supply a list of postcodes e.g. … WHERE OwnerPostcode in (“2144”,“2156”,“2142”) … the query is very fast.

Run the query through EXPLAIN to see what is going on. The subquery may be slow, or there may be so many results that the IN becomes too slow (you have a DISTINCT in the subquery which suggests that you have lots and lots of results that include duplicates (which is a whole other level of badness that you should fix)).
This query is basically an INNER JOIN between the two tables, have you tried writing it as such?

T_Mac · November 19, 2013, 8:14pm

Thank you gk53. An excellent solution that takes less than a second to run. I had no idea I could join a subquery.

vinny42 · November 19, 2013, 8:20pm

I had no idea I could join a subquery.

You can do allmost everyhing with a subquery that you can do with a regular table.

But the database cannot always use indexes to solve these joins so run an EXPAIN to seee what it’s doing, otherwise you may find that it works quickly for just the queries that you are testing now, and it may give serioues issues lateron.

T_Mac · November 19, 2013, 8:21pm

Thanks vinny42. I’m learning a lot today - I had no idea about EXPLAIN either.

T_Mac · November 19, 2013, 8:36pm

The EXPLAIN returned …

id  select_type         table          type    possible_keys  key     key_len  ref       rows  Extra

 1  PRIMARY             tbl_owners     ALL     (NULL)         (NULL)  (NULL)   (NULL)   11401  Using where; Using filesort   
 2  DEPENDENT SUBQUERY  tbl_postcodes  ALL     (NULL)         (NULL)  (NULL)   (NULL)   15372  Using where; Using temporary

Unfortunately, that doesn’t explain anything to me. I have no idea what it means

The weird thing is that the subquery runs in a flash when run by itself.

I’m not sure how I can fix the badness you mention. My tbl_postcodes table has fields Postcodes, Suburb, TerritoryID. Suburbs can have the same Postcode. I am assigning Postcodes to Territories by storing the TerritoryID in each record. e.g. The Hills District is in territory 12 so 2153 Baulkham Hills, 2153 Winston Hills, 2153 Bella Vista all have TerritoryID 12. If I don’t do the DISTINCT, I’ll get three copies of 2153 in my subquery.

vinny42 · November 19, 2013, 8:36pm

Learning is good. EXPLAIN can really be your friend, take some time to properly understand how it works.

Good luck!

vinny42 · November 19, 2013, 8:51pm

2 DEPENDENT SUBQUERY tbl_postcodes ALL (NULL) (NULL) (NULL) (NULL) 15372 Using where; Using temporary

The list of NULL’s normally contain the names of the indexes that the database can use to solve the query, and a list of indexes that it has actually decided to use.
The fact that there are all NULL’s suggests that there are no indexes that can be used to find, and the filesort and temporary mean that the database is basically just loading all the records, sticking them in a tempporary table on disk and sorting them there (filesort).
Adding separate indexes to ownerpostcode, _TerritoryID field and “Postcode” may help.

The weird thing is that the subquery runs in a flash when run by itself.

That’s because the subquery just plows through all the records and remembers the unique postcodes. That’s easy.
Your outer query searches through all the owners once for every postcode and because there probably is no index on the ownerpostcode the database will be forced to loop through all owners once for every postcode.

Suburbs can have the same Postcode.

Hmmm… that’s to bad, but it does mean that your DISTINCT is justified. You could try normalizing the postcodes into a separate table but I doubt that that will make much difference (and it’s a lot more work for your application) so try an index and see if that helps.

T_Mac · November 19, 2013, 11:14pm

Thank you so much for all your help on this.

Adding an index to TerritoryID in tbl_postcodes sped it up.

vinny42 · November 19, 2013, 11:19pm

yw!

Lemon_Juice · November 20, 2013, 3:28pm

This problem is caused by the imperfect query optimizer in MySQL. Apparently, things have improved and from MySQL 5.6.5 your original query should perform much better. I haven’t used 5.6 yet so I can’t confirm it for sure, but certainly there is hope:

Optimizing Subqueries with Semi-Join Transformations