SitePoint Sponsor

User Tag List

Results 1 to 7 of 7
  1. #1
    SitePoint Member
    Join Date
    Mar 2002
    Posts
    1
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Arrow SQL query: using MAX and GROUP BY with other fields

    Given these entries in my table:

    id date value
    4 2002-1-1 zero
    4 2002-2-26 first
    4 2002-2-28 second

    I am trying to write a SQL query to get the value at the maximum date. So I first tried this query and got:

    select MAX(date),value from MY_TBL group by id

    2002-2-28 first

    When it does the aggregate function of group by and then chooses a maximum date, it disreguards other fields. It then selects the first entry to show
    the value of.

    So it is even worse than that, because what I would really like to do is, get the latest value on a given day.

    So here is an example:

    id date value
    6 2002-2-2 zero
    6 2002-2-16 first
    6 2002-2-18 second
    6 2002-3-2 third

    select MAX(date),value from MY_TBL where date < '2002-3-2' group by id

    This result doesn't return me anything! Which would lead me to the same conclusion above. mysql groups everything first, decides the maximum value
    (in this case '2002-3-2' and then applies the where clause and since '2002-3-2' is not less than the date i was looking for it doesn't report anything!

    Is their a way to change the order of operations so that is first evaluates the where clause and then does the group/max?

    A way using a single query to get the value at a maximum date?

    Thanks for the help.

    Sincerely,
    Andrew Mirsky
    ajmirsky@hotmail.com

  2. #2
    Database Jedi MattR's Avatar
    Join Date
    Jan 2001
    Location
    buried in the database shell (Washington, DC)
    Posts
    1,107
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    According to the SQL standard you cannot include columns in your SELECT which are not in the group by. It shouldn't (and doesn't in this case) work.

    Try grouping by DATE instead.

    Adding 'value' to the SELECT clause will cause problems -- what if you had the following data:
    2002-2-2 zero
    2002-2-2 first

    What value would it select? Zero? First? Try this query:
    Code:
    SELECT MAX( date ),
           value -- non standard and probably won't work
      FROM MY_TBL
     WHERE date < 'etc'
     GROUP BY date

  3. #3
    Making a better wheel silver trophy DR_LaRRY_PEpPeR's Avatar
    Join Date
    Jul 2001
    Location
    Missouri
    Posts
    3,428
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Originally posted by MattR
    According to the SQL standard you cannot include columns in your SELECT which are not in the group by.
    just curious, should that not say "cannot include columns in your GROUP BY which are not in the SELECT?" that seems to make more sense. otherwise you would have to GROUP BY all columns, no?
    - Matt ** Ignore old signature for now... **
    Dr.BB - Highly optimized to be 2-3x faster than the "Big 3."
    "Do not enclose numeric values in quotes -- that is very non-standard and will only work on MySQL." - MattR

  4. #4
    Database Jedi MattR's Avatar
    Join Date
    Jan 2001
    Location
    buried in the database shell (Washington, DC)
    Posts
    1,107
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Nope, that is what the standard states. e.g.
    Code:
    SELECT col1, col2
      FROM t2
     GROUP BY col1
    Non-standard SQL!

    Because again in the example consider if you have:
    Code:
    SELECT SUM( col1 ), col2
      FROM table1
     GROUP BY col1
    Logically try and think about what that will do...

    If your data looks like this:
    Code:
    table1( col1, col2 )
    --------------------
    1, 'happy'
    2, 'joy'
    3, 'joebob'
    3, 'hippy'
    2, 'shut'
    1, 'up'
    1, 'hippy'
    In relational algebra you must always have distinct tuples (rows) in your result set. No exceptions. All tables must have, in some fashion, a unique way of identifying them. Remember that a result-set is just another mathematical set.

    So, when you ask the RDBMS to GROUP BY col1 it is going to whine "Well, you have this col2 thing hanging around". So you can try it in most RDBMS's (and it should break). However, MySQL 'relaxed' the standards a bit to make it more useful (Sybase did as well).

    In the above example the GROUP BY and SUM wants to return a single row, e.g.
    Code:
    SELECT sum( col1 ), col1
      FROM table1
     GROUP BY col1
    Is logical:
    Code:
    result set
    sum, col1
    ----------
    3, 1
    4, 2
    6, 3
    However, since you're not grouping by COL2 in the example, you make the engine go "Wait, you're asking me to duplicate sum counts, right? Uhm, ok. I want to return a single tuple for sum, but you're telling me that I can't. So, I can do one of two things. Return a single tuple for sum and randomly pick a col2 to display, or for each col2 which is in the sum, repeat the sum." In MySQL and Sybase, the latter is chosen:

    Code:
    result set
    sum, col2
    ---------
    3, 'happy'
    3, 'up'
    3, 'hippy'
    4, 'joy'
    4, 'shut'
    6, 'joebob'
    6, 'hippy'
    Explaining why relational calculus says this is VERBOTEN is outside the scope of this post; check out Codd's papers or get a relational calculus/algebra book.
    Last edited by MattR; Mar 21, 2002 at 21:49.

  5. #5
    Database Jedi MattR's Avatar
    Join Date
    Jan 2001
    Location
    buried in the database shell (Washington, DC)
    Posts
    1,107
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The standards also say that GROUP BY can only be used with one or more aggregate functions (but not all).

    For example:
    Code:
    SELECT sum( col1 )
      FROM table1
     GROUP BY col1
    Could return duplicate values (never mind the results would be fairly meaningless ), so you need to include 'col1' in the SELECT.

    Conversely, you cannot have a GROUP BY without aggregates. Again this comes to relational calculus restrictions but MySQL/Sybase say 'Well, we'll group them pretty for you' even though NEIN say Codd and Date.

    Sybase has a handy feature that you can enable when you run queries to let you know when you're deviating from the SQL standard:
    Code:
    1> SET FIPSFLAGGER ON
    2> GO
    1> SELECT SUM( col1 ), col2
    2>   FROM table1
    3>  GROUP BY col1
    4> GO
    Line number 1 contains Non-ANSI text. The error is caused due to the use of
    column name(s) in a select list with aggregates but column name(s) not in group
    by list.
    
    1> SELECT username
    2>   FROM user
    3>  GROUP BY username
    4> GO
    FIPS WARNING: GROUP BY clause specified when all the items in select list are
    aggregate functions or none of the items in select list are aggregate
    functions.
    Last edited by MattR; Mar 21, 2002 at 21:54.

  6. #6
    Making a better wheel silver trophy DR_LaRRY_PEpPeR's Avatar
    Join Date
    Jul 2001
    Location
    Missouri
    Posts
    3,428
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)


    that was WAY too much information for me. thanks though. very informative, as usual. i understand that what i thought is wrong now.

  7. #7
    Database Jedi MattR's Avatar
    Join Date
    Jan 2001
    Location
    buried in the database shell (Washington, DC)
    Posts
    1,107
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    LOL, glad I could help. Funny thing about standards -- vendors don't follow 'em too well, do they?


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •