SitePoint Sponsor

User Tag List

Results 1 to 9 of 9
  1. #1
    I solve practical problems. bronze trophy
    Michael Morris's Avatar
    Join Date
    Jan 2008
    Location
    Knoxville TN
    Posts
    2,034
    Mentioned
    65 Post(s)
    Tagged
    0 Thread(s)

    How to Join most recent record.

    Solved this problem yesterday after about 45 minutes of experimentation.

    Consider the problem of a query against table A that we are pulling records from. Table B is going to be joined and it has related data, but we want the most recent row of B for each A.

    (Real life example for those that parse them better: Consider a table of bills, and a table of appraisals that affect those bills - you need to pull the most recent appraisal).

    To complicate matters Table C is also going to be joined to table A. We need an aggregate pulled against one of C's columns. the group by clause will be on A's primary key.

    (Real life continued - here our third table was 'transactions' and I needed to sum the amount column).

    The problem is this - when you do a group by on A you'll only get 1 of B's records. Summing B's information is non-sensical. But you have to have the group by to get the sum of C's column correctly.

    My solution - an alias derived table.

    Code sql:
    SELECT 
      A.id AS id, 
      A.title AS title,
      B.appraised AS appraised
      SUM(C.amount) AS paid
    FROM A
      INNER JOIN C ON C.aid = A.id
      LEFT JOIN (
        SELECT appraised FROM B ORDER BY appraisalDate
    ) AS B ON B.aid = A.id
    GROUP BY A.id

    Now the most recent appraisal on table B will be included in the report.

    Posted in the hopes someone finds it useful, comments welcome.

  2. #2
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,275
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    there's a problem with your solution -- the value of B.appraised that is returned is ~not~ necessarily the most recent appraisal

    this is the "hidden column" problem outlined here: http://dev.mysql.com/doc/refman/5.0/...n-columns.html

    the correct way to select the latest B row would be as follows...
    Code:
    SELECT A.id
         , A.title
         , B.appraised
         , SUM(C.amount) AS paid
      FROM A  
    INNER 
      JOIN C 
        ON C.aid = A.id  
    LEFT OUTER
      JOIN ( SELECT aid
                  , MAX(appraisalDate) AS latest
               FROM B
             GROUP
                 BY aid ) AS Bm
        ON Bm.aid = A.id
    LEFT OUTER
      JOIN B
        ON B.aid = Bm.aid
       AND B.appraisalDate = Bm.latest
    GROUP 
        BY A.id
    joining to a MAX subquery as a derived table ensures that only one row from B is joined to each A, and thus the GROUP BY on A.id is sufficient
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  3. #3
    I solve practical problems. bronze trophy
    Michael Morris's Avatar
    Join Date
    Jan 2008
    Location
    Knoxville TN
    Posts
    2,034
    Mentioned
    65 Post(s)
    Tagged
    0 Thread(s)
    Thanks! From the sounds of it if that ever came up it would have been a PITA to track down.

    BTW, the actual query restricts both the outer and aliased query to a range of bills - possibly only on. Here's the relevant snippet (the whole query is nearly 200 lines).

    Code sql:
    LEFT JOIN (
      SELECT * 
      FROM ppt_land_appraisals ai 
      WHERE ai.pptbill_id IN ({$bills}) 
      ORDER BY ai.created_on
    ) AS a ON b.id = a.pptbill_id
    WHERE b.id IN ({$bills})

    So with what you're suggesting that should be...

    Code sql:
    LEFT JOIN (
      SELECT *, MAX(created_on) AS latest 
      FROM ppt_land_appraisals ai 
      WHERE ai.pptbill_id IN ({$bills}) 
      GROUP BY ai.pptbill_id
    ) AS a ON b.id = a.pptbill_id
    WHERE b.id IN ({$bills})

    Right?

  4. #4
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,275
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    not quite right, as the use of the dreaded, evil "select star" in incompatible with GROUP BY

    notice how i did it -- the derived table returns only the join column and the MAX, and this is then used to join to the B table

    also, you wouldn't need the WHERE clause for the bills list in the subquery, as this is taken care of when the derived table is joined
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  5. #5
    I solve practical problems. bronze trophy
    Michael Morris's Avatar
    Join Date
    Jan 2008
    Location
    Knoxville TN
    Posts
    2,034
    Mentioned
    65 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    not quite right, as the use of the dreaded, evil "select star" in incompatible with GROUP BY

    notice how i did it -- the derived table returns only the join column and the MAX, and this is then used to join to the B table

    also, you wouldn't need the WHERE clause for the bills list in the subquery, as this is taken care of when the derived table is joined
    Before the join though wouldn't it still consider all 170,000 rows on that table? That's what I'm trying to avoid.

    And yeah, I need to get rid of the select star before I push to testing. I put it in there for now cause I'm still working out which fields from that particular table I need.

  6. #6
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,275
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by Michael Morris View Post
    Before the join though wouldn't it still consider all 170,000 rows on that table? That's what I'm trying to avoid.
    not sure i understand this question...
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  7. #7
    I solve practical problems. bronze trophy
    Michael Morris's Avatar
    Join Date
    Jan 2008
    Location
    Knoxville TN
    Posts
    2,034
    Mentioned
    65 Post(s)
    Tagged
    0 Thread(s)
    The appraisal table against which the alias table query runs has in excess of 170,000 rows or more. Does MySQL have to look at all of that before starting work on the outer query, or does the limiting where statement of the outer query get migrated down?

    I know the end result will be the same, but my hunch is that having the where clause in the subquery will speed the parsing of the statement by limiting how many rows are actually considered for joining.

    I could be wrong. I guess when I have time I can time test it.

  8. #8
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,275
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by Michael Morris View Post
    I could be wrong. I guess when I have time I can time test it.
    or you could stick EXPLAIN in front of the two different query versions, and compare the execution plans

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  9. #9
    I solve practical problems. bronze trophy
    Michael Morris's Avatar
    Join Date
    Jan 2008
    Location
    Knoxville TN
    Posts
    2,034
    Mentioned
    65 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    or you could stick EXPLAIN in front of the two different query versions, and compare the execution plans

    How else do you get an accurate time test?


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •