SitePoint Sponsor

User Tag List

Results 1 to 12 of 12

Hybrid View

  1. #1
    SitePoint Enthusiast
    Join Date
    Jun 2005
    Location
    Sydney, Australia
    Posts
    31
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Why is this query taking so long to execute?

    Hi all,

    I have a stored procedure that runs this query ...

    Code:
    SELECT * FROM tbl_owners WHERE  OwnerPostcode IN (SELECT DISTINCT Postcode FROM tbl_postcodes WHERE TerritoryID = _TerritoryID) ORDER BY OwnerLastName;
    So, I am passing it a variable (_TerritoryID) to use in a subquery to get a list of postcodes. The subquery returns a list of about 20 or so Distinct postcodes from the subquery. The tbl_owners table has about 12000 owners.

    The query takes a minute or more to run.

    If I supply a list of postcodes e.g. ... WHERE OwnerPostcode in ("2144","2156","2142") ... the query is very fast.

    This makes me think that maybe the query is running the subquery ever time. Is that the case? And if so, is there a better way to write this query so that it only does the subquery once?

    TIA

  2. #2
    SitePoint Enthusiast
    Join Date
    Aug 2011
    Location
    OH, USA
    Posts
    72
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Try

    Code:
    select o.*
    FROM tbl_owners o
    join (SELECT Postcode 
    		FROM tbl_postcodes 
    		WHERE TerritoryID = _TerritoryID
    		GROUP BY Postcode) p
    	on o.OwnerPostcode = P.Postcode
    ORDER BY OwnerLastName;
    this use different execution plan and should be faster

  3. #3
    SitePoint Enthusiast
    Join Date
    Nov 2013
    Posts
    26
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The query takes a minute or more to run.

    If I supply a list of postcodes e.g. ... WHERE OwnerPostcode in ("2144","2156","2142") ... the query is very fast.
    Run the query through EXPLAIN to see what is going on. The subquery may be slow, or there may be so many results that the IN becomes too slow (you have a DISTINCT in the subquery which suggests that you have lots and lots of results that include duplicates (which is a whole other level of badness that you should fix)).
    This query is basically an INNER JOIN between the two tables, have you tried writing it as such?

  4. #4
    SitePoint Enthusiast
    Join Date
    Jun 2005
    Location
    Sydney, Australia
    Posts
    31
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks vinny42. I'm learning a lot today - I had no idea about EXPLAIN either.

  5. #5
    SitePoint Enthusiast
    Join Date
    Jun 2005
    Location
    Sydney, Australia
    Posts
    31
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thank you gk53. An excellent solution that takes less than a second to run. I had no idea I could join a subquery.

  6. #6
    SitePoint Enthusiast
    Join Date
    Nov 2013
    Posts
    26
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I had no idea I could join a subquery.
    You can do allmost everyhing with a subquery that you can do with a regular table.

    But the database cannot always use indexes to solve these joins so run an EXPAIN to seee what it's doing, otherwise you may find that it works quickly for just the queries that you are testing now, and it may give serioues issues lateron.

  7. #7
    SitePoint Enthusiast
    Join Date
    Jun 2005
    Location
    Sydney, Australia
    Posts
    31
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The EXPLAIN returned ...

    id select_type table type possible_keys key key_len ref rows Extra
    ------ ------------------ ------------- ------ ------------- ------ ------- ------ ------ ------------------------------
    1 PRIMARY tbl_owners ALL (NULL) (NULL) (NULL) (NULL) 11401 Using where; Using filesort
    2 DEPENDENT SUBQUERY tbl_postcodes ALL (NULL) (NULL) (NULL) (NULL) 15372 Using where; Using temporary

    Unfortunately, that doesn't explain anything to me. I have no idea what it means

    The weird thing is that the subquery runs in a flash when run by itself.

    I'm not sure how I can fix the badness you mention. My tbl_postcodes table has fields Postcodes, Suburb, TerritoryID. Suburbs can have the same Postcode. I am assigning Postcodes to Territories by storing the TerritoryID in each record. e.g. The Hills District is in territory 12 so 2153 Baulkham Hills, 2153 Winston Hills, 2153 Bella Vista all have TerritoryID 12. If I don't do the DISTINCT, I'll get three copies of 2153 in my subquery.

  8. #8
    SitePoint Enthusiast
    Join Date
    Nov 2013
    Posts
    26
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Learning is good. EXPLAIN can really be your friend, take some time to properly understand how it works.

    Good luck!

  9. #9
    SitePoint Enthusiast
    Join Date
    Nov 2013
    Posts
    26
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    2 DEPENDENT SUBQUERY tbl_postcodes ALL (NULL) (NULL) (NULL) (NULL) 15372 Using where; Using temporary
    The list of NULL's normally contain the names of the indexes that the database can use to solve the query, and a list of indexes that it has actually decided to use.
    The fact that there are all NULL's suggests that there are no indexes that can be used to find, and the filesort and temporary mean that the database is basically just loading all the records, sticking them in a tempporary table on disk and sorting them there (filesort).
    Adding separate indexes to ownerpostcode, _TerritoryID field and "Postcode" may help.


    The weird thing is that the subquery runs in a flash when run by itself.
    That's because the subquery just plows through all the records and remembers the unique postcodes. That's easy.
    Your outer query searches through all the owners once for every postcode and because there probably is no index on the ownerpostcode the database will be forced to loop through all owners once for every postcode.

    Suburbs can have the same Postcode.
    Hmmm.. that's to bad, but it does mean that your DISTINCT is justified. You could try normalizing the postcodes into a separate table but I doubt that that will make much difference (and it's a lot more work for your application) so try an index and see if that helps.

  10. #10
    SitePoint Enthusiast
    Join Date
    Jun 2005
    Location
    Sydney, Australia
    Posts
    31
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thank you so much for all your help on this.

    Adding an index to TerritoryID in tbl_postcodes sped it up.

  11. #11
    SitePoint Enthusiast
    Join Date
    Nov 2013
    Posts
    26
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    yw!

  12. #12
    SitePoint Guru bronze trophy
    Join Date
    Dec 2003
    Location
    Poland
    Posts
    930
    Mentioned
    7 Post(s)
    Tagged
    0 Thread(s)
    This problem is caused by the imperfect query optimizer in MySQL. Apparently, things have improved and from MySQL 5.6.5 your original query should perform much better. I haven't used 5.6 yet so I can't confirm it for sure, but certainly there is hope:

    Optimizing Subqueries with Semi-Join Transformations


Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •