SitePoint Sponsor

User Tag List

Results 1 to 15 of 15
  1. #1
    padawan silver trophybronze trophy markbrown4's Avatar
    Join Date
    Jul 2006
    Location
    Victoria, Australia
    Posts
    4,108
    Mentioned
    28 Post(s)
    Tagged
    2 Thread(s)

    MAX, GROUP BY, ORDER in Postgres.

    It sounds simple enough but I can't nut out a solution

    I have a rounds table (id, user_id, total_score)
    I'm wanting to get the top 10 scores but group by the users so that the top ten doesn't have any of the users more than once.

    This is giving the right scores:
    SELECT MAX(total_score) as max_score, user_id FROM "rounds" GROUP BY user_id ORDER BY max_score LIMIT 10

    I'm wanting to get the round id's though and including that in the group by means that the users aren't unique anymore.
    SELECT MAX(total_score) as max_score, user_id, id FROM "rounds" GROUP BY user_id, id ORDER BY max_score LIMIT 10

  2. #2
    From Italy with love silver trophybronze trophy
    guido2004's Avatar
    Join Date
    Sep 2004
    Posts
    9,495
    Mentioned
    161 Post(s)
    Tagged
    4 Thread(s)
    In MySQL you could eliminate the id from the GROUP BY, and it would give you the data you want (with only one of the round id's in case of a user with the same max score in more than 1 round):
    Code:
    SELECT 
        MAX(total_score) as max_score
      , user_id
      , id 
    FROM "rounds" 
    GROUP BY user_id 
    ORDER BY max_score 
    LIMIT 10
    I don't know if that works in postgres as well?

  3. #3
    padawan silver trophybronze trophy markbrown4's Avatar
    Join Date
    Jul 2006
    Location
    Victoria, Australia
    Posts
    4,108
    Mentioned
    28 Post(s)
    Tagged
    2 Thread(s)
    Hmm, no love.

    SELECT MAX(total_score) as max_score, user_id, id FROM "rounds" GROUP BY user_id ORDER BY max_score LIMIT 10;
    ERROR: column "rounds.id" must appear in the GROUP BY clause or be used in an aggregate function

    Edit Oh, I think this works.
    SELECT MAX(total_score) as max_score, user_id, MAX(id) as id FROM "rounds" GROUP BY user_id ORDER BY max_score LIMIT 10;

    Will test it out, thanks for tip off guido.

    Edit: No the MAX id hack doesn't work.

  4. #4
    padawan silver trophybronze trophy markbrown4's Avatar
    Join Date
    Jul 2006
    Location
    Victoria, Australia
    Posts
    4,108
    Mentioned
    28 Post(s)
    Tagged
    2 Thread(s)
    Actually, putting MAX in the query does work. I was just confused because I wasn't ordering by max_score DESC.
    Are there any downsides to this? Should another aggregate function be used instead?

    SOLVED
    Code sql:
    SELECT MAX(total_score) AS max_score, user_id, MAX(id) AS id FROM "rounds" GROUP BY user_id ORDER BY max_score DESC LIMIT 10;
    Thanks guido.

  5. #5
    SitePoint Guru
    Join Date
    Jan 2004
    Location
    Uppsala, sverige
    Posts
    696
    Mentioned
    2 Post(s)
    Tagged
    1 Thread(s)
    Code:
    select total_score,
           user_id, 
           id
      from "rounds" x
     where total_score =
          (select max(total_score)
             from "rounds"
            where "rounds".user_id = x.user_id)
     order 
        by total_score desc
     limit 10

  6. #6
    padawan silver trophybronze trophy markbrown4's Avatar
    Join Date
    Jul 2006
    Location
    Victoria, Australia
    Posts
    4,108
    Mentioned
    28 Post(s)
    Tagged
    2 Thread(s)
    Boom.

    I thought I had solved it but no, this is the correct answer.

    So, It's the MAX and the ORDER BY that complicates this and requires a nested select?

    Thanks for your help, I appreciate it.

  7. #7
    From Italy with love silver trophybronze trophy
    guido2004's Avatar
    Join Date
    Sep 2004
    Posts
    9,495
    Mentioned
    161 Post(s)
    Tagged
    4 Thread(s)
    If I'm not mistaken, that solution will still give you more than 1 row per user, if a user has multiple rows with the same max score.
    If you only want one row per user returned, and you don't care which one, then your solution is the way to go (returning always the latest round id of the rows with the max score for that user).

  8. #8
    padawan silver trophybronze trophy markbrown4's Avatar
    Join Date
    Jul 2006
    Location
    Victoria, Australia
    Posts
    4,108
    Mentioned
    28 Post(s)
    Tagged
    2 Thread(s)
    The previous one wasn't returning MAX(total_score) anymore, It seemed to be taking the MAX(id) in preference somehow..

    Because SQL does my head in and for efficiency I've actually chosen to bake the best_score out and save it against the user.
    Then it's a much more straightforward SELECT * FROM User ORDER BY max_score DESC LIMIT 10

    Are there good rules of thumb for baking out calculated fields like this in the db or is that frowned upon?

  9. #9
    From Italy with love silver trophybronze trophy
    guido2004's Avatar
    Join Date
    Sep 2004
    Posts
    9,495
    Mentioned
    161 Post(s)
    Tagged
    4 Thread(s)
    Quote Originally Posted by markbrown4 View Post
    The previous one wasn't returning MAX(total_score) anymore, It seemed to be taking the MAX(id) in preference somehow..
    Are you saying that this query didn't return the max total_score for each user_id?
    Code:
    SELECT MAX(total_score) AS max_score, user_id, MAX(id) AS id FROM "rounds" GROUP BY user_id ORDER BY max_score DESC LIMIT 10;
    I find that hard to believe. All aggregate functions should give a value relative to the columns you're grouping by, in this case user_id. Unless Postgres does things differently?

  10. #10
    padawan silver trophybronze trophy markbrown4's Avatar
    Join Date
    Jul 2006
    Location
    Victoria, Australia
    Posts
    4,108
    Mentioned
    28 Post(s)
    Tagged
    2 Thread(s)
    It was returning the correct scores for the user_id's but the id's weren't related anymore.
    Code:
    SELECT MAX(total_score) AS max_score, user_id, MAX(id) AS id FROM "rounds" GROUP BY user_id ORDER BY max_score DESC LIMIT 10;
     max_score | user_id | id 
    -----------+---------+----
        131.16 |       1 | 55
            55 |      44 | 54
          49.2 |      30 | 28
         42.33 |      33 | 49
          32.7 |       7 | 10
            32 |      34 | 35
        19.135 |      40 | 44
          18.5 |      38 | 48
          17.3 |      41 | 46
         13.04 |       3 |  4
    (10 rows)
    Code:
    SELECT total_score from rounds WHERE id IN (55,54,28,49,10,35,44,48,46,4) ORDER BY total_score DESC;
     total_score 
    -------------
          19.135
               0
               0
               0
               0
               0
               0
               0
               0
               0

  11. #11
    padawan silver trophybronze trophy markbrown4's Avatar
    Join Date
    Jul 2006
    Location
    Victoria, Australia
    Posts
    4,108
    Mentioned
    28 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by guido2004 View Post
    If I'm not mistaken, that solution will still give you more than 1 row per user, if a user has multiple rows with the same max score.
    If you only want one row per user returned, and you don't care which one, then your solution is the way to go (returning always the latest round id of the rows with the max score for that user).
    DISTINCT ON might help with the duplicates.
    Code sql:
    SELECT DISTINCT ON (total_score)
      total_score,
      user_id, 
      id
    FROM "rounds" x
    WHERE total_score =
         (SELECT MAX(total_score)
            FROM "rounds"
           WHERE "rounds".user_id = x.user_id)
    ORDER 
       BY total_score DESC
    LIMIT 10

  12. #12
    From Italy with love silver trophybronze trophy
    guido2004's Avatar
    Join Date
    Sep 2004
    Posts
    9,495
    Mentioned
    161 Post(s)
    Tagged
    4 Thread(s)
    Quote Originally Posted by markbrown4 View Post
    It was returning the correct scores for the user_id's but the id's weren't related anymore
    Yes of course. Stupid mistake from my part.

  13. #13
    Non-Member
    Join Date
    Feb 2012
    Posts
    892
    Mentioned
    10 Post(s)
    Tagged
    0 Thread(s)
    One quick possible solution, from the top of my head:

    Use a bigger LIMIT for the query extracting both user_id and id, a 100 would cover the worst case scenario, where every top player has top scores, make it a subquery, then group that by user in parent query to have only one player appearing for the top 10 positions.

  14. #14
    padawan silver trophybronze trophy markbrown4's Avatar
    Join Date
    Jul 2006
    Location
    Victoria, Australia
    Posts
    4,108
    Mentioned
    28 Post(s)
    Tagged
    2 Thread(s)
    Thanks, but swampBoogie's answer is correct.
    In any case, I've changed it now so that when a score is saved if it's the highest score it will be saved directly against the user now.
    [sql]SELECT * FROM User ORDER BY max_score DESC LIMIT 10[/sql]
    I think baking out calculations like this make sense and keep things faster / simpler in sql land.

  15. #15
    Non-Member
    Join Date
    Feb 2012
    Posts
    892
    Mentioned
    10 Post(s)
    Tagged
    0 Thread(s)
    Glad you sorted it out.

    For the sake of the argument, the advantages of what I'm proposing are these:

    - for the inner query, that gets the highest scores, you don't have to use the MAX() function, you just use ORDER BY total_score DESC LIMIT 100
    - for the outer query, that gets the distinct top 10, you now have a much smaller set, 100 records, to deal with

    It needs some testing but I believe it should be faster.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •