SitePoint Sponsor

User Tag List

Results 1 to 12 of 12
  1. #1
    SitePoint Addict
    Join Date
    Jun 2005
    Posts
    262
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Using COUNT and GROUP BY with multiple tables, including derived tables?

    Here's the table structure:
    Code:
    CREATE TABLE songs
    (
        song_id INT UNSIGNED NOT NULL AUTO_INCREMENT,
        name    VARCHAR(50)  NOT NULL,
        PRIMARY KEY (id)
    );
    
    CREATE TABLE ratings
    (
        song_id INT     UNSIGNED NOT NULL DEFAULT '0',
        user_id INT     UNSIGNED NOT NULL DEFAULT '0',
        rating  TINYINT UNSIGNED NOT NULL DEFAULT '1',
        PRIMARY KEY (song_id, user_id),
        INDEX       (user_id)
    );
    
    CREATE TABLE bookmarks
    (
        song_id INT UNSIGNED NOT NULL DEFAULT '0',
        user_id INT UNSIGNED NOT NULL DEFAULT '0',
        PRIMARY KEY (song_id, user_id),
        INDEX       (user_id)
    );
    Here's the sql code:
    Code:
    SELECT
        songs.name,
        average_rating,
        total_ratings,
        total_bookmarks
    FROM
        songs
        INNER
          JOIN
          (
            SELECT
                song_id,
                AVG(rating) AS average_rating,
                COUNT(*) AS total_ratings
            FROM
                ratings
            GROUP
                song_id
          ) AS ratings_summary
            ON ratings_summary.song_id = songs.song_id
        LEFT
          JOIN
          (
            SELECT
                song_id,
                COUNT(*) AS total_bookmarks
            FROM
                bookmarks
            GROUP BY
                song_id
          ) AS bookmarks_summary
            ON bookmarks_summary.song_id = songs.id
    ORDER BY
        total_ratings DESC;
    The above query works, but it takes 5 seconds. If I remove the ORDER BY it takes one-tenth of a second. If it remove either one of the sub-queries (derived tables) it also takes one-tenth of a second.

    This is really killing me. I've done searches about COUNT, GROUP BY, but there aren't many examples of getting proper counts when joining multiple one-to-many tables.

    Using COUNT with GROUP BY is fine if you're only joining two tables, but once you add a second one-to-many table to the mix, it throws all the aggregate functions out of whack because it causes the extras rows to multiple against each other.

    Instead of using derived tables I could use traditional joins directly against the tables, but then I'd have to resort to using sloppy COUNT(DISTINCT song_id) which works around it, but it's just as slow as the 5 second monster above. I'm really at my wits end here.

    The situation gets even worse because I'll actually need to add more one-to-many joins which'll make it even slower.

  2. #2
    SitePoint Wizard bronze trophy
    Join Date
    Jul 2008
    Posts
    5,757
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I'm probably not going to be able to help you, but I have a feeling someone else is going to ask you for the output of EXPLAIN.

  3. #3
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,328
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by champ View Post
    If it remove either one of the sub-queries (derived tables) it also takes one-tenth of a second.
    that's weird

    how long does this take --
    Code:
    SELECT songs.song_id
         , songs.name
         , average_rating
         , total_ratings
      FROM songs
    INNER
      JOIN ( SELECT song_id
                  , AVG(rating) AS average_rating
                  , COUNT(*) AS total_ratings
               FROM ratings
             GROUP
                 BY song_id ) AS ratings_summary
        ON ratings_summary.song_id = songs.song_id
    UNION ALL
    SELECT song_id
         , NULL
         , NULL
         , COUNT(*) AS total_bookmarks
      FROM bookmarks
    GROUP 
        BY song_id
    note there is no ORDER BY yet, and the rows still need to be condensed

    presumably the two SELECTs are quick, and i'm wondering if the UNION adds any significant overhead...
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  4. #4
    SitePoint Addict
    Join Date
    Jun 2005
    Posts
    262
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    that's weird

    how long does this take

    presumably the two SELECTs are quick, and i'm wondering if the UNION adds any significant overhead...
    Very fast. One-tenth of a second.

  5. #5
    SitePoint Addict
    Join Date
    Jun 2005
    Posts
    262
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    presumably the two SELECTs are quick, and i'm wondering if the UNION adds any significant overhead...
    rudy, I'm not sure what you're getting at here. How does this incorporate the "total_bookmarks" field that collects a different set of stats than "total_ratings"?

  6. #6
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,328
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    please try this both with and without ORDER BY
    Code:
    SELECT song_id
         , MAX(name) AS name
         , MAX(average_rating) AS average_rating
         , MAX(total_ratings) AS total_ratings
         , MAX(total_bookmarks) AS total_bookmarks
      FROM (
           SELECT songs.song_id
                , songs.name
                , average_rating
                , total_ratings
                , NULL AS total_bookmarks
             FROM songs
           INNER
             JOIN ( SELECT song_id
                         , AVG(rating) AS average_rating
                         , COUNT(*) AS total_ratings
                      FROM ratings
                    GROUP
                        BY song_id ) AS ratings_summary
               ON ratings_summary.song_id = songs.song_id
           UNION ALL
           SELECT song_id
                , NULL
                , NULL
                , NULL
                , COUNT(*) AS total_bookmarks
             FROM bookmarks
           GROUP 
               BY song_id
           ) AS d
    GROUP
        BY song_id
    ORDER
        BY total_ratings DESC
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  7. #7
    SitePoint Wizard Ren's Avatar
    Join Date
    Aug 2003
    Location
    UK
    Posts
    1,060
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    First look, I don't see why there is a need for 2 sub queries.

    Code:
    SELECT s.name, AVG(r.rating), COUNT(r.user_id), COALESCE(b.popularity, 0) AS popularity 	
    	FROM songs AS s
    		INNER JOIN ratings AS r ON s.song_id = r.song_id
    		LEFT JOIN (SELECT song_id, COUNT(user_id) AS popularity FROM bookmarks GROUP BY 1) AS b ON b.song_id = s.song_id
    	GROUP BY 4, 1, s.song_id
    	ORDER BY 4 DESC, 1
    Maybe?
    Last edited by Ren; Oct 31, 2009 at 20:50. Reason: Added COALESCE

  8. #8
    SitePoint Addict
    Join Date
    Jun 2005
    Posts
    262
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    please try this both with and without ORDER BY
    Thanks rudy. They both take a little over one-tenth of a second to execute. Now I just need to study it to understand how I can join other one-to-many and many-to-many tables to it.

    Quote Originally Posted by Ren View Post
    First look, I don't see why there is a need for 2 sub queries.

    Code:
    SELECT s.name, AVG(r.rating), COUNT(r.user_id), COALESCE(b.popularity, 0) AS popularity 	
    	FROM songs AS s
    		INNER JOIN ratings AS r ON s.song_id = r.song_id
    		LEFT JOIN (SELECT song_id, COUNT(user_id) AS popularity FROM bookmarks GROUP BY 1) AS b ON b.song_id = s.song_id
    	GROUP BY 4, 1, s.song_id
    	ORDER BY 4 DESC, 1
    Maybe?
    Ren, unfortunately that query takes 15 seconds to execute.

  9. #9
    SitePoint Wizard Ren's Avatar
    Join Date
    Aug 2003
    Location
    UK
    Posts
    1,060
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by champ View Post

    Ren, unfortunately that query takes 15 seconds to execute.
    Hmm, not good.

    Just out of curiosity how many rows has the songs table?

    I'm imaging it to be fairly sizeable, or something else is drastically wrong.

  10. #10
    SitePoint Addict
    Join Date
    Jun 2005
    Posts
    262
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    please try this both with and without ORDER BY
    After playing around with this query there seems to be some problems. All the data in the second union is always included in the result set. I don't see how to properly place conditions on the query. For instance, let's say I wanted to only include songs that have more than 3 total ratings. Every logical place I try to place the condition results in several columns containing NULL values including the total_bookmarks column which definitely shouldn't be NULL. I'd like to get this to work as it's fast.

    Quote Originally Posted by Ren View Post
    Hmm, not good.

    Just out of curiosity how many rows has the songs table?

    I'm imaging it to be fairly sizeable, or something else is drastically wrong.
    About 10,000.

  11. #11
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,328
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by champ View Post
    All the data in the second union is always included in the result set.
    well, yeah, that's how UNION queries work

    but the outer query should be collapsing the result set of the UNION

    the "more than 3 ratings" would be a HAVING condition inside the ratings_summary subquery
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  12. #12
    SitePoint Addict
    Join Date
    Jun 2005
    Posts
    262
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    but the outer query should be collapsing the result set of the UNION

    the "more than 3 ratings" would be a HAVING condition inside the ratings_summary subquery
    Thanks rudy. You're right. I was placing the condition in the wrong area. I was able to eliminate the NULL rows by adding the following to the outer query:
    Code:
    ...
    GROUP
        BY song_id
    HAVING
        total_ratings IS NOT NULL
    ORDER
        BY total_ratings DESC
    I don't know if the HAVING clause is the best or most appropriate way to deal with this, but it works.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •