SitePoint Sponsor

User Tag List

Results 1 to 8 of 8

Thread: complex query.

  1. #1
    SitePoint Guru mwolfe's Avatar
    Join Date
    Mar 2005
    Posts
    912
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    complex query.

    Alright, this is something i've wondered how to do, but its actually someone elses problem.

    He has 2 tables, one for examnames, and one for examusers
    the examnames table holds the name of the exam, and its primary key -id.
    the examusers table holds a username, an examtaken, and an its primary key -id.

    so if you wanted to get all exams a user has not taken, is this possible to do with a query/subquery instead of writing all of the exams they have taken to an array, then getting all the exams that are possible, and basically pulling out the ones that donot intersect.. I thought there may be a way to do this in a single query.. is it possible..

    for reference, table structure is like this
    exams:
    examname exam_id
    bank1 1
    bank2 2

    examusers examtaken
    user1 bank1
    user2 bank1
    user1 bank2

    this isnt my structure, i would have put the id for the exam and not the name, but this should work about as well given there arent a lot of exams..

    Any ideas?

  2. #2
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,347
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by mwolfe
    ... is this possible to do with a query/subquery instead of writing all of the exams they have taken to an array, then getting all the exams that are possible, and basically pulling out the ones that donot intersect.. I thought there may be a way to do this in a single query.. is it possible..
    yes, this is possible, and your phrase "pulling out the ones that do not intersect" is an excellent description of a CROSS JOIN of users and banks with a LEFT OUTER JOIN and an IS NULL condition to find the ones that are missing

    the nice part is, that's exactly what you ask the database to return to you, rather than you doing it with code
    Code:
    select users.userid
         , banks.examname
      from (
           select distinct userid
             from examusers
           ) as users  
    cross
      join banks
    left outer
      join examusers
        on users.userid = examusers.userid
       and banks.examname = examusers.examname
     where examusers.examname is null
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  3. #3
    SitePoint Guru mwolfe's Avatar
    Join Date
    Mar 2005
    Posts
    912
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    hey thanks a lot.. i'm a bit confused about the banks part.. banks1 and banks2 and all that were names of the exams.. i'm not sure what they mean, thats how they were named though.

    here are the actual tables

    mysql> describe exams;
    +----------+-------------+------+-----+---------+----------------+
    | Field | Type | Null | Key | Default | Extra |
    +----------+-------------+------+-----+---------+----------------+
    | id | int(11) | NO | PRI | NULL | auto_increment |
    | examname | varchar(32) | NO | MUL | | |
    | date | date | NO | | | |
    +----------+-------------+------+-----+---------+----------------+
    3 rows in set (0.00 sec)

    mysql> describe examusername;
    +-----------+-------------+------+-----+---------+----------------+
    | Field | Type | Null | Key | Default | Extra |
    +-----------+-------------+------+-----+---------+----------------+
    | id | int(11) | NO | PRI | NULL | auto_increment |
    | username | varchar(32) | NO | | | |
    | examgiven | varchar(32) | NO | | | |
    | score | int(4) | NO | | | |
    +-----------+-------------+------+-----+---------+----------------+
    4 rows in set (0.03 sec)

  4. #4
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,347
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    you know, every time somebody "sanitizes" their table and column names in order to pose what they think is a simple question, inevitably the solution that is given to them doesn't make sense with the made up names, and then they have to post the actual structure and ask for a second solution, which requires twice the amount of work than if they had just posted the original structure in the first place

    <sigh />

    it's exactly the same query, just the name have changed

    except now i'm not as sure it will work for you, eh
    Code:
    select users.username
         , exams.examname
      from (
           select distinct username
             from examusername
           ) as users  
    cross
      join exams
    left outer
      join examusername
        on users.username = examusername.username
       and exams.examname = examusername.examgiven
     where examusername.examgiven is null
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  5. #5
    SitePoint Guru mwolfe's Avatar
    Join Date
    Mar 2005
    Posts
    912
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thank you r937 for your help. I am sorry about the confusion.

    I never stated that banks was the name of a table though.. i was giving examples of entries in a table (that happened to be prefixed with the term banks for whatever reason). This was not my table structure to begin with, and i did not even have a real sample table (only what i had read from the person who posted the original problem). When i posted i was mostly asking for the approach to take. I guess in this circumstance (as with many) its a lot easier to explain how its done by simply doing it..
    Also, since i am new to sql, typically when i write queries they are looking for specific information with basic conditions to find them, thus the only columns that are important to the query are the ones in which the data i'm looking for are in, and thats why i skipped "score" and whatever else..

    I can usually follow the syntax just fine of a query whether or not i have the exact names down, this however was a bit different because i was confused what you were meaning banks since you were referring to the same column with different names.. (self join i believe) so it doubled the confusion.

    In all reality, this is only going to be done on a single user at a time, the one visiting the page, so that he sees only the exams he hasnt taken.. I'm not sure if your query handles that, it doesnt seem like it would since we are never checking against a given username..

    THanks again though for your help.. I'll see if i can get it figured out.

  6. #6
    SitePoint Guru mwolfe's Avatar
    Join Date
    Mar 2005
    Posts
    912
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    actually, sorry, don't bother explaining the last part of my previous post to me.. i know what to do from here, thanks for you help, your query worked perfect.
    Oh and sorry for causing you to write the query twice, hopefully now i can figure out what it means.

  7. #7
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,347
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    no prob

    the rewrite was no big deal, cut & paste, eh

    let me know if it doesn't work ...
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  8. #8
    SitePoint Guru mwolfe's Avatar
    Join Date
    Mar 2005
    Posts
    912
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    it works great now.. I was curious though as to how much, if any, difference this would make over my original version. Turns out its actually slower.. It probably has some benefits though as it doesnt use as much resources since we donot need to save a create arrays out of the values.. and get the difference of them.

    I wrote a script to insert 40,000 values into examusername, consisting of 1000 distinct usernames and 1000 distinct exams. Once i had the script working correctly, i truncated the tables and then ran it several times.. With now about 113,000 + entries (i created a unique index on (examusername, examgiven) so that i wouldnt have the same combo more than once, and thats why i do not have a multiple of 40,000 rows), i get results typically in the range 0.01 seconds to .1 using the basic method i thought up, versus .1-.4 seconds using the query you gave me. Both of these seem quite amazing to me that they take such a small amount of time on my 1.5ghz athlon. I'm sure a lot of it has to do with the basic naming i chose..

    user_$num where num is between 1-1000, and for exams
    exam_$num as well.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •