SitePoint Sponsor

User Tag List

Results 1 to 15 of 15

Thread: MAX, GROUP BY, ORDER in Postgres.

  1. #1
    padawan silver trophy
    SitePoint Award Recipient markbrown4's Avatar
    Join Date
    Jul 2006
    Location
    Victoria, Australia
    Posts
    4,015
    Mentioned
    25 Post(s)
    Tagged
    0 Thread(s)

    MAX, GROUP BY, ORDER in Postgres.

    It sounds simple enough but I can't nut out a solution

    I have a rounds table (id, user_id, total_score)
    I'm wanting to get the top 10 scores but group by the users so that the top ten doesn't have any of the users more than once.

    This is giving the right scores:
    SELECT MAX(total_score) as max_score, user_id FROM "rounds" GROUP BY user_id ORDER BY max_score LIMIT 10

    I'm wanting to get the round id's though and including that in the group by means that the users aren't unique anymore.
    SELECT MAX(total_score) as max_score, user_id, id FROM "rounds" GROUP BY user_id, id ORDER BY max_score LIMIT 10

  2. #2
    From Italy with love bronze trophy
    guido2004's Avatar
    Join Date
    Sep 2004
    Posts
    8,605
    Mentioned
    76 Post(s)
    Tagged
    4 Thread(s)
    In MySQL you could eliminate the id from the GROUP BY, and it would give you the data you want (with only one of the round id's in case of a user with the same max score in more than 1 round):
    Code:
    SELECT 
        MAX(total_score) as max_score
      , user_id
      , id 
    FROM "rounds" 
    GROUP BY user_id 
    ORDER BY max_score 
    LIMIT 10
    I don't know if that works in postgres as well?

  3. #3
    padawan silver trophy
    SitePoint Award Recipient markbrown4's Avatar
    Join Date
    Jul 2006
    Location
    Victoria, Australia
    Posts
    4,015
    Mentioned
    25 Post(s)
    Tagged
    0 Thread(s)
    Hmm, no love.

    SELECT MAX(total_score) as max_score, user_id, id FROM "rounds" GROUP BY user_id ORDER BY max_score LIMIT 10;
    ERROR: column "rounds.id" must appear in the GROUP BY clause or be used in an aggregate function

    Edit Oh, I think this works.
    SELECT MAX(total_score) as max_score, user_id, MAX(id) as id FROM "rounds" GROUP BY user_id ORDER BY max_score LIMIT 10;

    Will test it out, thanks for tip off guido.

    Edit: No the MAX id hack doesn't work.

  4. #4
    padawan silver trophy
    SitePoint Award Recipient markbrown4's Avatar
    Join Date
    Jul 2006
    Location
    Victoria, Australia
    Posts
    4,015
    Mentioned
    25 Post(s)
    Tagged
    0 Thread(s)
    Actually, putting MAX in the query does work. I was just confused because I wasn't ordering by max_score DESC.
    Are there any downsides to this? Should another aggregate function be used instead?

    SOLVED
    Code sql:
    SELECT MAX(total_score) AS max_score, user_id, MAX(id) AS id FROM "rounds" GROUP BY user_id ORDER BY max_score DESC LIMIT 10;
    Thanks guido.

  5. #5
    SitePoint Guru
    Join Date
    Jan 2004
    Location
    Uppsala, sverige
    Posts
    690
    Mentioned
    2 Post(s)
    Tagged
    1 Thread(s)
    Code:
    select total_score,
           user_id, 
           id
      from "rounds" x
     where total_score =
          (select max(total_score)
             from "rounds"
            where "rounds".user_id = x.user_id)
     order 
        by total_score desc
     limit 10

  6. #6
    padawan silver trophy
    SitePoint Award Recipient markbrown4's Avatar
    Join Date
    Jul 2006
    Location
    Victoria, Australia
    Posts
    4,015
    Mentioned
    25 Post(s)
    Tagged
    0 Thread(s)
    Boom.

    I thought I had solved it but no, this is the correct answer.

    So, It's the MAX and the ORDER BY that complicates this and requires a nested select?

    Thanks for your help, I appreciate it.

  7. #7
    From Italy with love bronze trophy
    guido2004's Avatar
    Join Date
    Sep 2004
    Posts
    8,605
    Mentioned
    76 Post(s)
    Tagged
    4 Thread(s)
    If I'm not mistaken, that solution will still give you more than 1 row per user, if a user has multiple rows with the same max score.
    If you only want one row per user returned, and you don't care which one, then your solution is the way to go (returning always the latest round id of the rows with the max score for that user).

  8. #8
    padawan silver trophy
    SitePoint Award Recipient markbrown4's Avatar
    Join Date
    Jul 2006
    Location
    Victoria, Australia
    Posts
    4,015
    Mentioned
    25 Post(s)
    Tagged
    0 Thread(s)
    The previous one wasn't returning MAX(total_score) anymore, It seemed to be taking the MAX(id) in preference somehow..

    Because SQL does my head in and for efficiency I've actually chosen to bake the best_score out and save it against the user.
    Then it's a much more straightforward SELECT * FROM User ORDER BY max_score DESC LIMIT 10

    Are there good rules of thumb for baking out calculated fields like this in the db or is that frowned upon?

  9. #9
    From Italy with love bronze trophy
    guido2004's Avatar
    Join Date
    Sep 2004
    Posts
    8,605
    Mentioned
    76 Post(s)
    Tagged
    4 Thread(s)
    Quote Originally Posted by markbrown4 View Post
    The previous one wasn't returning MAX(total_score) anymore, It seemed to be taking the MAX(id) in preference somehow..
    Are you saying that this query didn't return the max total_score for each user_id?
    Code:
    SELECT MAX(total_score) AS max_score, user_id, MAX(id) AS id FROM "rounds" GROUP BY user_id ORDER BY max_score DESC LIMIT 10;
    I find that hard to believe. All aggregate functions should give a value relative to the columns you're grouping by, in this case user_id. Unless Postgres does things differently?

  10. #10
    padawan silver trophy
    SitePoint Award Recipient markbrown4's Avatar
    Join Date
    Jul 2006
    Location
    Victoria, Australia
    Posts
    4,015
    Mentioned
    25 Post(s)
    Tagged
    0 Thread(s)
    It was returning the correct scores for the user_id's but the id's weren't related anymore.
    Code:
    SELECT MAX(total_score) AS max_score, user_id, MAX(id) AS id FROM "rounds" GROUP BY user_id ORDER BY max_score DESC LIMIT 10;
     max_score | user_id | id 
    -----------+---------+----
        131.16 |       1 | 55
            55 |      44 | 54
          49.2 |      30 | 28
         42.33 |      33 | 49
          32.7 |       7 | 10
            32 |      34 | 35
        19.135 |      40 | 44
          18.5 |      38 | 48
          17.3 |      41 | 46
         13.04 |       3 |  4
    (10 rows)
    Code:
    SELECT total_score from rounds WHERE id IN (55,54,28,49,10,35,44,48,46,4) ORDER BY total_score DESC;
     total_score 
    -------------
          19.135
               0
               0
               0
               0
               0
               0
               0
               0
               0

  11. #11
    padawan silver trophy
    SitePoint Award Recipient markbrown4's Avatar
    Join Date
    Jul 2006
    Location
    Victoria, Australia
    Posts
    4,015
    Mentioned
    25 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by guido2004 View Post
    If I'm not mistaken, that solution will still give you more than 1 row per user, if a user has multiple rows with the same max score.
    If you only want one row per user returned, and you don't care which one, then your solution is the way to go (returning always the latest round id of the rows with the max score for that user).
    DISTINCT ON might help with the duplicates.
    Code sql:
    SELECT DISTINCT ON (total_score)
      total_score,
      user_id, 
      id
    FROM "rounds" x
    WHERE total_score =
         (SELECT MAX(total_score)
            FROM "rounds"
           WHERE "rounds".user_id = x.user_id)
    ORDER 
       BY total_score DESC
    LIMIT 10

  12. #12
    From Italy with love bronze trophy
    guido2004's Avatar
    Join Date
    Sep 2004
    Posts
    8,605
    Mentioned
    76 Post(s)
    Tagged
    4 Thread(s)
    Quote Originally Posted by markbrown4 View Post
    It was returning the correct scores for the user_id's but the id's weren't related anymore
    Yes of course. Stupid mistake from my part.

  13. #13
    Dumitru "Mitică" UNGUREANU itmitică's Avatar
    Join Date
    Feb 2012
    Location
    Fălticeni
    Posts
    656
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    One quick possible solution, from the top of my head:

    Use a bigger LIMIT for the query extracting both user_id and id, a 100 would cover the worst case scenario, where every top player has top scores, make it a subquery, then group that by user in parent query to have only one player appearing for the top 10 positions.
    "I am the wisest man alive, for I know one thing, and that is that I know nothing."

  14. #14
    padawan silver trophy
    SitePoint Award Recipient markbrown4's Avatar
    Join Date
    Jul 2006
    Location
    Victoria, Australia
    Posts
    4,015
    Mentioned
    25 Post(s)
    Tagged
    0 Thread(s)
    Thanks, but swampBoogie's answer is correct.
    In any case, I've changed it now so that when a score is saved if it's the highest score it will be saved directly against the user now.
    [sql]SELECT * FROM User ORDER BY max_score DESC LIMIT 10[/sql]
    I think baking out calculations like this make sense and keep things faster / simpler in sql land.

  15. #15
    Dumitru "Mitică" UNGUREANU itmitică's Avatar
    Join Date
    Feb 2012
    Location
    Fălticeni
    Posts
    656
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Glad you sorted it out.

    For the sake of the argument, the advantages of what I'm proposing are these:

    - for the inner query, that gets the highest scores, you don't have to use the MAX() function, you just use ORDER BY total_score DESC LIMIT 100
    - for the outer query, that gets the distinct top 10, you now have a much smaller set, 100 records, to deal with

    It needs some testing but I believe it should be faster.
    "I am the wisest man alive, for I know one thing, and that is that I know nothing."

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •