SitePoint Sponsor

User Tag List

Results 1 to 7 of 7
  1. #1
    SitePoint Guru pinch's Avatar
    Join Date
    Mar 2005
    Posts
    688
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Filtering Select By a Column

    I have a table which records certifications. This table holds all certifications, old and new, but I only want to pull the most recent one for each 'department'.

    My table

    Certifications
    CertificationID
    DepartmentID
    CertifyingUser
    CertificationTimestamp

    So, I want to pull the most recent certifications for each department and ignore the old ones. If I have 3 records:

    1, 1, "Bob", 3/4/09
    2, 1, "Fred", 3/5/09
    3, 2, "Ed", 3/4/09

    I only want to pull 2 & 3 because 2 is only a re-certification of 1 with a later date.

    I researched the DISTINCT statement but I don't think it can help me.

  2. #2
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,347
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    this query fits the "row with biggest X for each Y" pattern, alternatively called The Rows Holding the Group-wise Maximum of a Certain Field (apologies if you're not using MySQL, but the techniques work equally well in other database systems)

    there are about eleven ways to do it, but here's one --
    Code:
    SELECT Certifications.CertificationID
         , Certifications.DepartmentID
         , Certifications.CertifyingUser
         , Certifications.CertificationTimestamp
      FROM ( SELECT DepartmentID
                  , MAX(CertificationTimestamp) AS latest
               FROM Certifications
             GROUP
                 BY DepartmentID ) AS subquery
    INNER
      JOIN Certifications
        ON Certifications.DepartmentID = subquery.DepartmentID
       AND Certifications.CertificationTimestamp = subquery.latest
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  3. #3
    SitePoint Guru pinch's Avatar
    Join Date
    Mar 2005
    Posts
    688
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Arrow

    Thanks for that, I think I have that concept down now. However, the problem I originally presented was a bit of a [major] simplification of what I'm actually trying to do. I tried to apply this concept to my current query and while I got the expected results, it wasn't exacty what I needed.

    As I've stated in other threads I've started, my schema is closely tied to a hierarchy table and this query is no different. I am using CTEs (as suggested by you (although grudgedly) I think) What is shown below is the query I was working with before my previous post and it gets me very close to where I need to go, however it displays all certifications (even the duplicates I'm trying to avoid) and I'm not sure how to meld the concept you provided with what I already have:

    Code SQL:
    ALTER PROCEDURE dbo.Personnel_GetPersonnelCertificationSummariesByStartingNodeID
      @StartingNodeID INT
    AS
     
    WITH Nodes (NodeID, ParentNodeID, CommandTierCode, SequencePath)
    AS
    (
    SELECT NodeID, ParentNodeID, CommandTierCode, SequencePath 
    	FROM CommandHierarchy 
    	WHERE NodeID = @StartingNodeID
     
    UNION ALL
     
    SELECT CH.NodeID, CH.ParentNodeID, CH.CommandTiercode, CH.SequencePath FROM CommandHierarchy CH JOIN Nodes R
     ON R.NodeID = CH.ParentNodeID
    )
     
    SELECT   
      Nodes.CommandTierCode,
      Nodes.NodeID,
     
      PersonnelCertifications.CertificationID,
      PersonnelCertifications.PersonnelID,
      PersonnelCertifications.MissingDataPoints,
      PersonnelCertifications.CertificationTimestamp,
     
    	Personnel.FirstName, 
    	Personnel.MiddleInitial, 
    	Personnel.LastName,
    	Personnel.SuffixCode,
    	Personnel.ServiceMember,
     
    FROM 
    	Nodes 
    LEFT OUTER JOIN 
    	PersonnelCertifications 
    ON 
    	Nodes.NodeID = PersonnelCertifications.CommandHierarchyNodeID
    LEFT OUTER JOIN
      Personnel
    ON
      PersonnelCertifications.PersonnelID = Personnel.PersonnelID
     
    ORDER BY Nodes.SequencePath
     
    RETURN

    What the above does is pull all certifications for every node in my hierarchy, even those nodes that have not yet been certified, and orders them by the 'SequencePath' column which is populated for each node.

    There are a couple of reasons why I'm tying this query to a resursive CTE:

    1. I want the query to pull certifications for all nodes in my hierarchy, even those that have not yet been certified.
    2. I want to order my resulting set by the 'SequencePath' column used in the CTE.

    When I tried to apply the concept you provided I immediately noticed that I was already using the 'FROM' to tie-in to the CTE. When I removed that tie-in and integrated your concept I indeed got the most recent certifications, but as I said before I need to pull a record for all nodes, even those that have not yet been certified. Also, the other columns that I needed from my Personnel table did not show up in my results, because of the sub-query I imagine.

    It's looking like this may not be possible with a single query?

  4. #4
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,347
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    oh, i recognize that query -- very tricky, getting me to answer the simple one because nobody answered your other thread



    okay where you have this --
    Code:
    FROM 
        Nodes 
    LEFT OUTER JOIN 
        PersonnelCertifications 
    ON 
        Nodes.NodeID = PersonnelCertifications.CommandHierarchyNodeID
    LEFT OUTER JOIN
      Personnel
    ON
      PersonnelCertifications.PersonnelID = Personnel.PersonnelID
    try this instead --
    Code:
      FROM Nodes 
    LEFT OUTER
      JOIN ( SELECT CommandHierarchyNodeID
                  , MAX(CertificationTimestamp) AS latest
               FROM PersonnelCertifications
             GROUP
                 BY CommandHierarchyNodeID ) AS subquery
        ON subquery.CommandHierarchyNodeID = Nodes.NodeID
    LEFT OUTER 
      JOIN PersonnelCertifications 
        ON PersonnelCertifications.CommandHierarchyNodeID = Nodes.NodeID
       AND PersonnelCertifications.CertificationTimestamp = subquery.latest
    LEFT OUTER 
      JOIN Personnel
        ON Personnel.PersonnelID = PersonnelCertifications.PersonnelID
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  5. #5
    SitePoint Guru pinch's Avatar
    Join Date
    Mar 2005
    Posts
    688
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    very tricky, getting me to answer the simple one because nobody answered your other thread
    I honestly wasn't trying to be tricky, just thought that if I had a simple solution I may be able to figure out the tougher problem. My other post is actually quite different (I think) than this question, but I may be able to use the concept of subqueries on that one as well (I hope).

    So if I'm following that solution correctly:

    1. The subquery is returning only the most recent certifications for each CommandHierarchyNodeID for those that exist in the Certifications table.

    2. The first JOIN between the CTE result and the subquery simply adds a few new columns (namely CommandHierarchyNodeID and CertificationTimestamp) to the set resulting from the CTE join (which contains all nodes).

    3. The second JOIN to the Certifications table is what allows the columns from that table to be queried.

    On the second JOIN to the Certifications table, I've been able to get the most recent record by using only one 'ON' instead of two:

    Code SQL:
    ON
      Certifications.CertificationTimestamp = subquery.latest

    instead of

    Code SQL:
    ON
      Certifications.CommandHierarchyNodeID = Nodes.NodeID AND
      Certifications.CertificationTimestamp = subquery.latest

    Is there a reason that I need the

    Code SQL:
      Certifications.CommandHierarchyNodeID = Nodes.NodeID

    part that I'm not thinking about?

    Thanks again.

  6. #6
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,347
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    yes, there is a reason

    without matching the latest timestamp for the appropriate node, you will end up matching any node with that timestamp
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  7. #7
    SitePoint Guru pinch's Avatar
    Join Date
    Mar 2005
    Posts
    688
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    yes, there is a reason

    without matching the latest timestamp for the appropriate node, you will end up matching any node with that timestamp
    Ah Ha!

    You the man.



Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •