SitePoint Sponsor

User Tag List

Results 1 to 23 of 23
  1. #1
    SitePoint Evangelist Alchemist's Avatar
    Join Date
    Aug 2006
    Posts
    412
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    limiting records per channel

    back with a new question on a query...

    I have several channels and need to get no more than 10 records per channel up to a maximum of 40 records from all channels.

    Code:
    SELECT Video_ID FROM media_video WHERE Channel_ID IN ('1','2') ORDER BY Video_PostedOn DESC LIMIT 0,40
    now, if Channel 1 has >= 40 records, the query will not return what I'm expecting as no records will be retrieved for Channel 2. how do I get around that? thanks in advance

  2. #2
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,015
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    retrieve 40 rows and count rows per channel as you loop over them
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  3. #3
    SitePoint Evangelist Alchemist's Avatar
    Join Date
    Aug 2006
    Posts
    412
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I'm not sure I'm following you

  4. #4
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,015
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    you said "up to a maximum of 40 records from all channels"

    and i said okay, go get all 40, and do the rest in your application layer

    make more sense now?

    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  5. #5
    SitePoint Evangelist Alchemist's Avatar
    Join Date
    Aug 2006
    Posts
    412
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    no, as that would not get me what I want.

    ok, this is for a subscription module. the user can subscribe to as many channels as he wishes. let's say he subscribes to 2 channels. Since I only want to retrieve a maximum of 40 records, only a maximum of 20 records per channel (the most recent) is what I want. Otherwise, if Channel 1 is the one with the 40 most recent records, he would not be able to see any records from Channel 2. Let's say he subscribes to 4 channels. In that case, the maximum number of records per channel would be 10 (40/4)...you get the idea?

  6. #6
    SitePoint Evangelist Alchemist's Avatar
    Join Date
    Aug 2006
    Posts
    412
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    so any idea on how I should approach this? one way might be to retrieve the maximum number of records per channel one by one and then combine and sort them accordingly in my app layer....

  7. #7
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,015
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    aha, i get it now

    wow, that's complicated

    a stored procedure would do it

    pass in the list of channel ids, and have it count them, divide 40 by the count, and run a "top N" query using WHERE channel_id IN ( list )

    the toughest part of that is optimizing the "top N" query
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  8. #8
    SitePoint Evangelist Alchemist's Avatar
    Join Date
    Aug 2006
    Posts
    412
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I have found a post on google answers answering a similar question. what does this query do?

    Code:
    SET @c:=0;
    SET @a:=0;
    SELECT p.*, IF(@c=p.company, @a:=@a+1, @a:=1) AS rownum, (@c:=p.company) 
    FROM person p
    GROUP BY company, rownum
    HAVING rownum <= 3

  9. #9
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,015
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by Alchemist View Post
    what does this query do?
    it uses a very wonky GROUP BY clause

    any time you use GROUP BY together with the dreaded, evil "select star" you are taking part in a crap shoot

    try adapting it to your scenario

    it just might work

    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  10. #10
    SitePoint Evangelist Alchemist's Avatar
    Join Date
    Aug 2006
    Posts
    412
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    ok first of all, I don't really understand what this means:

    Code:
    SET @c:=0;
    SET @a:=0;
    SELECT p.*, IF(@c=p.company, @a:=@a+1, @a:=1) AS rownum, (@c:=p.company)
    I prefer to avoid using a stored procedure as I don't want to mix business logic with database stuff. The only other way around this would be:

    1. divide 40 by number of subscribed channels
    2. issue one query for each channel limiting the results by the number obtained in #1
    3. combine all result sets and sort them by date DESC in the business logic layer


    this, however, would seem quite inefficient in those cases where the user subscribed to several channels as several queries would have to be run.

    any thoughts?
    Last edited by Alchemist; Apr 12, 2008 at 06:54.

  11. #11
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,015
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    i still say the easiest approach is just go get 40 per channel

    if he has 4 channels, you are getting 160

    if he has 14 channels, you are getting ... a larger number, but still a small result set, completely manageable as a memory array

    loop over the returned results, figure out how many channels there are, and then don't show any more than the first 40/N per channel

    simple
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  12. #12
    SitePoint Evangelist Alchemist's Avatar
    Join Date
    Aug 2006
    Posts
    412
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    ok. let's say he subscribes to 10 channels

    40*10 = 400

    Code:
    SELECT Video_ID FROM media_video ORDER BY Video_PostedOn DESC LIMIT 0,400
    now let's say the first 300 records belong to channel 2, while the remaining 100 records belong to channel 3 and 5. he subscribed to channels 1 through 10. he won't see any records from channels 1, 4, 6, 7, 8, 9, 10 whereas I would like to show the TOP 4 records for each channel (40/10). am I missing something?

    if you have a youtube account, subscribe to a bunch of channels and go to your subscriptions area you will see how it's setup

  13. #13
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,015
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by Alchemist View Post
    am I missing something?
    yes

    that query gets the top 400 overall

    i'm suggesting that you write the query to get the top 40 per channel

    then, afterwards, you can figure out how many to show per channel based on how many channels actually showed up

    you don't need to know in advance that he has 10 channels
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  14. #14
    SitePoint Evangelist Alchemist's Avatar
    Join Date
    Aug 2006
    Posts
    412
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I know but then I'm totally lost. I researched a bit and the TOP N query is an SQL query which is traslated to a LIMIT N in mysql. I already said I don't want to use a stored procedure.....

  15. #15
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,015
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    actually, that would be correct only if you used LIMIT in a correlated subquery, as you would also have to do using TOP
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  16. #16
    SitePoint Evangelist Alchemist's Avatar
    Join Date
    Aug 2006
    Posts
    412
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    so how am I supposed to use this TOP N syntax...I can't find anything on it

  17. #17
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,015
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    last three videos per channel (pseudocode):
    Code:
    select channel
         , video
         , lastdate
      from channels as t
     where ( select count(*)
               from channels
              where channel = t.channel
               and lastdate > t.lastdate ) < 3
    for each video, count the number of videos in the same channel with a later date, and if the count is less than three (i.e. it's 2 or 1 or 0) then this is the 3rd or 2nd or 1st latest video in that channel
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  18. #18
    SitePoint Evangelist Alchemist's Avatar
    Join Date
    Aug 2006
    Posts
    412
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    this is the same query I was using for another part of the project and it was very slow. any other way to skin the cat?

    also, something I just ran into which might be trivial for someone more experienced than me.

    Code:
    SELECT Video_Title FROM media_video WHERE Video_ID IN ('1', '2', '3')
    if no ORDER BY clause is specified, how can I get the records in the exact order of the ids in the IN clause? I have a similar query with a list of ids in the IN clause and no ORDER BY, but they are not returned in the same order.

    thanks!!

  19. #19
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,015
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by Alchemist View Post
    this is the same query I was using for another part of the project and it was very slow. any other way to skin the cat?
    yes, i know it is, i helped you with it about a month ago

    we went down a long, circuitous side route, where you were going to store the ids of the last three videos in each channel


    if no ORDER BY clause is specified, how can I get the records in the exact order of the ids in the IN clause?
    you can't, you must use an ORDER BY

    Code:
    SELECT ...
      FROM ...
     WHERE Video_ID IN ( 1, 2, 3 )
    ORDER
        BY FIELD(Video_ID, 1, 2, 3 )
    note: please don't quote numeric constants used with a numeric column
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  20. #20
    SitePoint Evangelist Alchemist's Avatar
    Join Date
    Aug 2006
    Posts
    412
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    now we are talking...

    ok. the order by issue if fixed now. I quoted Video_ID because in reality they are varchar. I would not quote INT. That was just an example.

    with regard to the slow query, yes I save the most recent 3 videos for each channel as I need it for a different part of the application. however, here we don't know how many videos per subscribed channel we will have to get.

  21. #21
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,015
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    right, so you need a mechanism to get the last 40 per channel for every channel (40 is the maximum you would need)

    and you need it indexed properly
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  22. #22
    SitePoint Evangelist Alchemist's Avatar
    Join Date
    Aug 2006
    Posts
    412
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    well, I guess storing the last 40 video ids for each channel somewhere is out of the question. the layout of the subscription page consists of a left column where I list all the subscribed channels, and a right column where the latest videos for each channel are listed. I could redesign the whole thing where clicking on a channel link would make a ajax call and retrieve only the latest x videos for that channels instead of aggregating everything. but how does youtube do it? and so fast?

  23. #23
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,015
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    retrieving the latest 40 videos per channel is a query that you should optimize for, then

    as for youtube, i have no idea

    but i'll venture that if you gave me their hardware, and their DBAs, and especially their funding, i could probably figure it out for you...

    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •