SitePoint Sponsor

User Tag List

Results 1 to 7 of 7
  1. #1
    SitePoint Enthusiast BogusRed's Avatar
    Join Date
    Dec 2008
    Location
    San Jose, CA, USA
    Posts
    51
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Best way to do joins

    I have a table called BR_writing that contains writing submissions.
    I have a table called BR_categories with a list of categories such as Fantasy, Science Fiction, Anime, etc.
    I then have another table called BR_writing_to_cat of just ids to link the two together. IE: writing_id, category_id

    Normally, if I wanted to, say, select all entries in the Science Fiction category, it would be something like:

    Code:
    SELECT W.writing_id, title, summary 
    FROM BR_writing W, BR_writing_to_cat C
    WHERE W.writing_id = C.writing_id AND category_id = '".$scifi_id."' 
    LIMIT 0, 10

    But what if I want to select all writing entries in Anime AND Fantasy? Below is the query I have now, but it seems like there might be a better or more optimal one out there.

    Code:
    SELECT writing_id, title, summary
    FROM BR_writing_to_cat A, BR_writing W
    
    INNER JOIN BR_writing_to_cat B 
    ON A.writing_id = B.writing_id 
    AND B.category_id = ".$id1." 
    
    WHERE A.writing_id = W.id AND A.category_id = ".$id2."

    If I wanted to do three at once, i'd do:
    Code:
    SELECT writing_id, title, summary
    FROM BR_writing_to_cat A, BR_writing W
    
    INNER JOIN BR_writing_to_cat B 
    ON A.writing_id = B.writing_id 
    AND B.category_id = ".$id1." 
    
    INNER JOIN BR_writing_to_cat C 
    ON B.writing_id = C.writing_id 
    AND C.cat_id = ".$id2." 
    
    WHERE A.writing_id = W.id AND A.category_id = ".$id3."
    I'm wondering if there's a faster query than this, and one that I could scale to maybe 5 category selections at once. Thanks for your help!

  2. #2
    SitePoint Wizard bronze trophy
    Join Date
    Jul 2006
    Location
    Augusta, Georgia, United States
    Posts
    4,194
    Mentioned
    17 Post(s)
    Tagged
    4 Thread(s)
    indeed there is a more efficient and scalable way. Using group by will make sure that only a single unique writing exists within the result set regardless of the number of categories it appears in. The same query can be used regardless of the number of categories. For optimization purposes you will also want to make sure there is an index on w2c.category_id, w2c.writing_id.

    (untested)
    Code SQL:
    SELECT
          w.writing_id
         ,w.title
         ,w.summary
      FROM
         BR_writing w
     INNER
      JOIN
         BR_writing_to_cat w2c
        ON
         w.writing_id = w2c.writing_id
     WHERE
         w2c.category_id IN (1,2,3,4)
     GROUP
        BY
         w.writing_id
    The only code I hate more than my own is everyone else's.

  3. #3
    SitePoint Enthusiast BogusRed's Avatar
    Join Date
    Dec 2008
    Location
    San Jose, CA, USA
    Posts
    51
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi Oddz. Thanks so much for your help! I gave this a try and the result set ended up giving me entries that are in one of the entered categories. What I want to select are writing entries that are in all of the entered categories. On my site, writing entries can be classified into multiple categories. So I'd like a way to filter entries that are in BOTH sci fi and fantasy, for example. Is this possible?


    Here's the exact query I used:

    Code:
    SELECT
         w.id
        ,w.title
        ,w.summary
     FROM
        BR_writing w
     INNER
     JOIN
        BR_gallery_contents w2c
       ON
        w.id = w2c.medium_id
     WHERE
        w2c.gallery_id IN (367, 19)
     GROUP
       BY
        w.id
    ORDER BY updated DESC

  4. #4
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,341
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Code:
    SELECT w.id
         , w.title
         , w.summary
      FROM ( SELECT medium_id
               FROM BR_gallery_contents
              WHERE gallery_id IN (367, 19)
             GROUP
                 BY medium_id 
             HAVING COUNT(*) = 2 ) AS these
    INNER
      JOIN BR_writing AS w
        ON w.id = these.medium_id
    the HAVING count should be equal to however many categories you want the medium to belong to
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  5. #5
    SitePoint Wizard bronze trophy
    Join Date
    Jul 2006
    Location
    Augusta, Georgia, United States
    Posts
    4,194
    Mentioned
    17 Post(s)
    Tagged
    4 Thread(s)
    r937 out of mere curiosity why this way:

    Code SQL:
    SELECT w.id
         , w.title
         , w.summary
      FROM ( SELECT medium_id
               FROM BR_gallery_contents
              WHERE gallery_id IN (367, 19)
             GROUP
                 BY medium_id 
             HAVING COUNT(*) = 2 ) AS these
    INNER
      JOIN BR_writing AS w
        ON w.id = these.medium_id

    instead of this way:

    Code SQL:
    SELECT
         w.id
        ,w.title
        ,w.summary
     FROM
        BR_writing w
     INNER
     JOIN
        BR_gallery_contents w2c
       ON
        w.id = w2c.medium_id
     WHERE
        w2c.gallery_id IN (367, 19)
     GROUP
       BY
        w.id
    HAVING
         COUNT(*) = 2
    The only code I hate more than my own is everyone else's.

  6. #6
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,341
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    think about it this way -- in my subquery, the GROUP BY is performed on only one table, and it's a small table to boot (only 2 columns)

    only the appropriate ids, surviving the HAVING, are then used to join to rows from the w table

    in your query, you perform all the joins first, then do a GROUP BY on a much wider row, and throw a lot more stuff away in the HAVING clause

    it's a case of applying the filtering restriction first, then doing the join, versus doing the join, then applying the filtering restriction
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  7. #7
    SitePoint Wizard bronze trophy
    Join Date
    Jul 2006
    Location
    Augusta, Georgia, United States
    Posts
    4,194
    Mentioned
    17 Post(s)
    Tagged
    4 Thread(s)
    I had a feeling the reason related to optimization considering I recall transio posting a similar solution to a problem a few weeks back. thanks.
    The only code I hate more than my own is everyone else's.


Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •