SitePoint Sponsor

User Tag List

Results 1 to 9 of 9
  1. #1
    SitePoint Enthusiast
    Join Date
    Aug 2008
    Posts
    96
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Subquery vs Join Question

    I'm under the assumption it's usually better tho run a join query than a subquery. Though I've read today that it's preferential to use a subquery over join queries that will create a temporay table.

    So is it a good rule of thumb to always write subqueries in place of joins that require the DISTINCT or ORDER BY commands?

  2. #2
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,340
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by maestro360 View Post
    Though I've read today that it's preferential to use a subquery over join queries that will create a temporay table.
    wha???

    Quote Originally Posted by maestro360 View Post
    So is it a good rule of thumb to always write subqueries in place of joins that require the DISTINCT or ORDER BY commands?
    no it isn't

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  3. #3
    SitePoint Enthusiast
    Join Date
    Aug 2008
    Posts
    96
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks for weighing in. I read this in a book called High Performance MySQL.

    The example they gave references two tables `film` which has 1,000 rows and `film_actor` which has 5,642 total rows.

    Join Query:
    SELECT DISTINCT film.film_if FROM film
    INNER JOIN film_actor USING(film_id)

    Subquery:
    SELECT film_id FROM film
    WHERE EXISTS(SELECT * FROM film_actor
    WHERE film.film_id=film_actor.film_id)

    The queries return 997 results. On my localhost they process at:

    Join: .008 - .0015 secs
    Subquery: .007 - .0011 secs

    The book's benchmarks show results in Queries per Sec

    Join: 185 qps
    Subquery: 325 qps

    They say when you want to find rows in table1 that match rows in table2 and eliminate any duplicate info a subquery can sometimes perform better

  4. #4
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,340
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    that's a highly contrived example, and i am at a loss to understand why they would use that example to illustrate a benchmark

    ask yourself what the query is supposed to return, whether the subquery version or the join version (hint: they produce the same result)

    i can easily explain why the subquery version is faster -- it's because an EXISTS subquery can be optimized, whereas the join, coupled with the use of DISTINCT, retrieves tons of extraneous rows, only to throw away most of them

    still wondering what the purpose of the query is?

    another type of subquery is called a correlated subquery, and it is notoriously inefficient as compared to a join, but this example with the films and the actors doesn't use a correlated subquery

    you should not draw any general conclusions about subqueries versus joins, as you did in your first post, such as "preferential to use a subquery over join queries that will create a temporay table" (aside: there is no temporary table involved here)

    okay, let's get back to the purpose of the query

    there are 1000 films in the films table, and 5642 actors in the film_actors table, right?

    have you figured out what the purpose of the query is yet?

    find all films that have at least one actor

    how useless is that?? we are told that the query returns 997 rows, which isn't at all surprising (it means there are only 3 films that have no actor at all)

    however, i'm sure you will agree that this is quite the contrived example -- in real life, it would be much more likely that you would want to find the films that don't have an actor, rather than to list all the ones that do

    please, don't draw any conclusions from it -- other than, yes, an EXISTS subquery is quite efficient

    bear in mind, though, that EXISTS subqueries are rare, and not always this contrived

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  5. #5
    SitePoint Wizard guelphdad's Avatar
    Join Date
    Oct 2003
    Location
    St. Catharines, ON Canada
    Posts
    1,708
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Also to consider, how old is the book? what version of mysql are they using in the book? Have advances been made since that time in how mysql performs?

  6. #6
    SitePoint Enthusiast
    Join Date
    Aug 2008
    Posts
    96
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Also to consider, how old is the book? what version of mysql are they using in the book? Have advances been made since that time in how mysql performs?
    This could be the issue. The book was written in 2008, using MySQL 5.1. Though I thought at least my local host was using the latest version of MySQL as I just downloaded the Wamp server pack I was testing with.

    Though I've just found Wamp's download for 64 bit operating systems, which is the one I'm using, only includes MySQL 5.1.53.

    Have there been any upgrades to MySQL's query optimizer since these versions?

  7. #7
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,340
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    mysql 5.1 is a very mature database engine

    any comments on the post i made explaining how contrived that example is?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  8. #8
    SitePoint Enthusiast
    Join Date
    Aug 2008
    Posts
    96
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Sorry, didn't see your post before my last reply.

    Sure, I can see how this can be a manipulated example. There is no need for a join or a subquery. The example could've been just as easily written:

    SELECT DISTINCT film_id FROM film_actor

    Though performance even between this and the original subquery is neck and neck.

    But from your explanation, I understand the only reason the subquery is efficient is because of the EXIST clause.

    So I guess I should revise my earlier statement:

    So is it a good rule of thumb to always write subqueries in place of joins that require the DISTINCT or ORDER BY commands?
    to:

    So is it a good rule of thumb to always write EXIST() subqueries in place of joins that require the DISTINCT command?

    Also you said the DISTINCT join query in this example didn't create a temporary table.

    Was I wrong also in my assumption that ORDER BY and DISTINCT clauses usually create temp tables?

  9. #9
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,340
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by maestro360 View Post
    So is it a good rule of thumb to always write EXIST() subqueries in place of joins that require the DISTINCT command?
    no, it depends on the purpose of the query

    Quote Originally Posted by maestro360 View Post
    Was I wrong also in my assumption that ORDER BY and DISTINCT clauses usually create temp tables?
    usually is such a loaded word

    again, it depends on the query, and on the indexes that are available

    my advice: don't keep looking for generalities
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •