SitePoint Sponsor

User Tag List

Results 1 to 8 of 8
  1. #1
    Certified Ethical Hacker silver trophybronze trophy dklynn's Avatar
    Join Date
    Feb 2002
    Location
    Auckland
    Posts
    14,692
    Mentioned
    20 Post(s)
    Tagged
    3 Thread(s)

    JOIN or UNION problem?

    After reading r937's new book, Simply SQL, it served as a badly needed kick-in-the-butt to have a look at several of my SQL queries. Unfortunately, I'm stumped on one (for a home-grown CMS):

    Three tables:
    group w/id, name and order

    author w/id and name

    articles w/id, authorid, groupid, title, text and creationdate
    Problem: To create a SQL statement which will select ONLY the most current two articles from each group ordered by group.order and articles.creationdate.

    Question: What is the way to use JOINs or UNIONs - or embedded SELECT statements - (OR PHP) to effectively run through the list of groups (similar to foreach()) to limit each group's contribution to the query result to two?

    Status: I've implemented a HORRIBLE mysqli_query() which must (by the sheer numbers of records returned) gather and sort ALL the articles at which point PHP eliminates all but the first two (most recent) articles from each group. This is only tolerable because I've cached the result set (currently 2 x 30) but I know this is NOT the way to do it. As an alternative, I could get the list of groups (in group.order) and use PHP to iterate through multiple queries but this seems almost as ridiculous for the sheer number of individual queries.

    Experts: What did I miss in r937's presentation of JOINs or UNIONs (or embedded SELECT statements)?

    Regards,

    DK
    David K. Lynn - Data Koncepts is a long-time WebHostingBuzz (US/UK)
    Client and (unpaid) WHB Ambassador
    mod_rewrite Tutorial Article (setup, config, test & write
    mod_rewrite regex w/sample code) and Code Generator

  2. #2
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,347
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    there are several ways to find the "top two in each group"

    the first way uses a correlated subquery --
    Code:
    SELECT g.name AS group_name
         , g.order
         , ar.title
         , ar.creationdate
         , au.name AS author 
         , ar.text
      FROM articles AS ar
    INNER
      JOIN group AS g
        ON g.id = ar.groupid
    INNER
      JOIN author AS au
        ON au.id = ar.authorid
     WHERE ar.id IN 
           ( SELECT id
               FROM articles
              WHERE groupid = ar.groupid
             ORDER
                 BY creationdate DESC LIMIT 2 )
    the subquery is correlated because it uses the value ar.groupid inside the subquery, and this value is correlated to the ar row in the outer query

    the problem with this solution is that not all versions of mysql allow LIMIT in a subquery

    a second solution also uses a correlated subquery but uses COUNT(*) with a "theta join" (which sounds like i'm showing off my vocabulary, but simply means that the join condition is not based on equality) --
    Code:
    SELECT g.name AS group_name
         , g.order
         , ar.title
         , ar.creationdate
         , au.name AS author 
         , ar.text
      FROM articles AS ar
    INNER
      JOIN group AS g
        ON g.id = ar.groupid
    INNER
      JOIN author AS au
        ON au.id = ar.authorid
     WHERE ( SELECT COUNT(*) 
               FROM articles  
              WHERE groupid = ar.groupid 
                AND creationdate > ar.creationdate ) < 2
    a third solution uses a derived table instead of a correlated subquery (a derived table is another name for a subquery used in the FROM clause), where the subquery consists of a theta join along with GROUP BY and HAVING for the count --
    Code:
    SELECT g.name AS group_name
         , g.order
         , ar.title
         , ar.creationdate
         , au.name AS author 
         , ar.text
      FROM articles AS ar
    INNER
      JOIN ( SELECT t1.id
               FROM articles AS t1
             LEFT OUTER
               JOIN articles AS t2
                 ON t1.groupid = t2.groupid
                AND t1.creationdate < t2.creationdate
             GROUP
                 BY t1.id
             HAVING COUNT(t2.id) < 2
           ) AS top2
        ON top2.id = ar.id
    INNER
      JOIN group AS g
        ON g.id = ar.groupid
    INNER
      JOIN author AS au
        ON au.id = ar.authorid
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  3. #3
    Certified Ethical Hacker silver trophybronze trophy dklynn's Avatar
    Join Date
    Feb 2002
    Location
    Auckland
    Posts
    14,692
    Mentioned
    20 Post(s)
    Tagged
    3 Thread(s)
    r937,

    Thanks Rudy! That's a lot to digest in one sitting so I'll go have a play with both my test server (MySQL 5) and production server (MySQL 4) to see which has the problem with the LIMIT inside the subquery.
    I was really thrilled to see the SQL programmer tricks and especially the UNION with null fields (table cells) in the GROUP BY example at the end of Chapter 8. An excellent treatment to make it "Simply SQL"!
    Regards,

    DK
    David K. Lynn - Data Koncepts is a long-time WebHostingBuzz (US/UK)
    Client and (unpaid) WHB Ambassador
    mod_rewrite Tutorial Article (setup, config, test & write
    mod_rewrite regex w/sample code) and Code Generator

  4. #4
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,347
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Off Topic:

    thanks for the kind words, david

    did you spot the typo in the SELECT clause near the bottom of p.179? i haven't submitted my list of errata yet, but this is one of them

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  5. #5
    Certified Ethical Hacker silver trophybronze trophy dklynn's Avatar
    Join Date
    Feb 2002
    Location
    Auckland
    Posts
    14,692
    Mentioned
    20 Post(s)
    Tagged
    3 Thread(s)
    Rudy,

    Closing the parenthetical on CAST? That's an obvious typo!

    I was impressed at the way that you took the "techno-speak" out of the SQL and used "web-speak" (rows and columns rather than records and fields) as well as not going overboard about normalization choosing instead just to discuss relational tables.



    Regards,

    DK
    David K. Lynn - Data Koncepts is a long-time WebHostingBuzz (US/UK)
    Client and (unpaid) WHB Ambassador
    mod_rewrite Tutorial Article (setup, config, test & write
    mod_rewrite regex w/sample code) and Code Generator

  6. #6
    SitePoint Wizard guelphdad's Avatar
    Join Date
    Oct 2003
    Location
    St. Catharines, ON Canada
    Posts
    1,708
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by dklynn View Post
    not going overboard about normalization
    I understand what you are saying, but normalization is such an important part of database applications and it is either misunderstood, overlooked or ignored by most new users.

    they then post problems to this forum (or others) and when it is pointed out that they've recklessly abandoned normalization and this is the problem they are having with the task at hand, a good majority reply with something along the lines of: "well that's just how the tables are and I don't have time to go back and fix it now". Instead of course they end up with messy coding, needlessly slow queries etc., and spend countless hours that could have been saved if they'd normalized their data in the first place.

    So just don't forget about it.

    I agree the book is good, especially for those who know a database application will help them advance their website, but don't have time to devote all their spare time on understanding all the ins/outs of RDBMSs.

    Good read Rudy. Thanks.

  7. #7
    Certified Ethical Hacker silver trophybronze trophy dklynn's Avatar
    Join Date
    Feb 2002
    Location
    Auckland
    Posts
    14,692
    Mentioned
    20 Post(s)
    Tagged
    3 Thread(s)
    guelphdad,

    Sorry to have given you the impression that I wasn't concerned about normalization. I am FULLY aware that data MUST be normalized (separated into new tables and maintained ONCE to prevent duplication and update problems). I'm afraid to say that I have this mindset and glossed over Rudy's 'why' explanation before diving into relational tables.

    My main comment about "Simply SQL" is that it made a technical subject simple enough for newbies (with just a little HTML experience) to learn about SQL statements.

    Regards,

    DK
    David K. Lynn - Data Koncepts is a long-time WebHostingBuzz (US/UK)
    Client and (unpaid) WHB Ambassador
    mod_rewrite Tutorial Article (setup, config, test & write
    mod_rewrite regex w/sample code) and Code Generator

  8. #8
    SitePoint Guru
    Join Date
    Sep 2008
    Posts
    977
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    At the risk of another accusation that I am 'teacher's pet' I am going to say that since being referred to sitepoint some months ago ~ you know who you were ~, rudy has explained everything in non-techno-speak. Indeed, his explanations have been so clear and his patience so significant, that I have got a good handle on mysql and it is the easier thing I have learned in coding. (perl, xhtml and css being the others). he used to ask questions just to make sure I had learned the meaning behind the solution and not just to take the answer and run with it blindly. A great combination of teching methods; I think. (see what I did there ).

    bazz


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •