SitePoint Sponsor

User Tag List

Results 1 to 10 of 10
  1. #1
    SitePoint Enthusiast
    Join Date
    Jul 1999
    Location
    Jacksonville, FL
    Posts
    28
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    SQL - return values if not in 2nd table

    It's Sunday and my brain is fried -- seems like this should be simple.

    I have two tables.

    tblA contains two columns tblA.id, tblA.value
    tblB contains two columns tablB.id, tblB.value

    .value is a text field

    I want to return values from tblA.value where tblA.value is not equal to tblB.value.

    So, I tried (in MS Access):
    Code:
    SELECT tblA.value 
    FROM tblA, tblB
    WHERE lcase(tblA.value) <> lcase(tblB.value)
    I'm trying this as I type:
    Code:
    SELECT tblA.value
    FROM tblA
    WHERE lcase(tblA.value) NOT IN
     (select lcase(tblB.value) from tblB)
    The first query did not return what I expected (more rows than are in tblA). The second query is taking forever to run. FWIW: tblA has about 10,000 records and tblB has about 70,000.

    Any help would be greatly appreciated!
    Last edited by shawn; May 11, 2003 at 14:58.

  2. #2
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,251
    Mentioned
    59 Post(s)
    Tagged
    3 Thread(s)
    your first query matched every row of tblA with every non-matching by value row in tblB

    that would give you approx O(n**2) rows

    it sounds like you want to try to match on id and return tblA rows which don't have a match in tblB based on id --

    select tblA.value
    from tblA
    left outer
    join tblB
    on tblA.id = tblB.id
    where tblB.id is null

    or perhaps you want to match on id but find the matches with different values --

    select tblA.value
    from tblA
    inner
    join tblB
    on tblA.id = tblB.id
    where tblA.value <> tblB.value


    rudy

  3. #3
    SitePoint Enthusiast
    Join Date
    Jul 1999
    Location
    Jacksonville, FL
    Posts
    28
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    rudy.

    thanks -- your explanation for the first query makes sense.

    let me try to clarify.

    The "key" for each table is a text field "value". If tblA.value = tblB.value then I don't want it returned in the query results. I want to exclude those matches.

    So, if I have:
    tblA
    value
    ---------
    text_1
    text_2
    text_3

    and

    tblB
    value
    --------------
    text_1
    text_4
    text_5

    I want the result of the query to be:
    -------
    text_2
    text_3

    I tried the inner join as you posted and it returned all values from tblA.

    shawn

  4. #4
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,251
    Mentioned
    59 Post(s)
    Tagged
    3 Thread(s)
    then i guess you want a slightly different left outer join

    select tblA.value
    from tblA
    left outer
    join tblB
    on tblA.value = tblB.value
    where tblB.value is null
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  5. #5
    SitePoint Enthusiast
    Join Date
    Jul 1999
    Location
    Jacksonville, FL
    Posts
    28
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    rudy,

    I think I get it. The idea is that w/ a left outer join, I'd be returning the non-matching values from table A.

    I tried it quick and it seems like this type of query will take some time to run, even on my relatively small tables. Is this b/c it's text comparison?

    Do you think it would be better (read FASTER) to do this programmatically by comparing each value?

    Thanks for forcing me to revisit joins -- someday I'll have this drilled into my head!

  6. #6
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,251
    Mentioned
    59 Post(s)
    Tagged
    3 Thread(s)
    sorry, i don't really understand your application, but i'd bet that the database left outer join is going to be way faster than you doing the exact same thing in code, unless you can pull both tables into memory

    remember, row no. 5,678 of table A has to be compared to all 70,000 rows of table B, then row no. 5,679...

    you're doing aalmost a billion comparisons, don't expect it to run all that fast

    did you declare indexes on the value columns?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  7. #7
    SitePoint Enthusiast
    Join Date
    Jul 1999
    Location
    Jacksonville, FL
    Posts
    28
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    rudy,

    it's just too much for the db -- did it both in ms access and in mysql tonight. takes too long. i'm thinking binary comparison -- i've seen what i'm talking about done incredibly quickly programmatically, but the app was written in assembly.

    the app i'm working on is for my newletter -- the db for the newsletter is separate from the software i use to send it -- now lyris would be nice, but since it's segmented, that's way to expensive so, i'm dealing with the unsubs and bounces. the software i have does purge list b from list a, but it's doing a programmatic comparison and as you correctly stated, it takes a loooong time... although it's taking list b and counting down.

    again, thanks for the help!

  8. #8
    Just Blow It bronze trophy
    DaveMaxwell's Avatar
    Join Date
    Nov 1999
    Location
    Mechanicsburg, PA
    Posts
    7,264
    Mentioned
    115 Post(s)
    Tagged
    1 Thread(s)
    Do you not have indexes on those key values? Unless you are getting into several THOUSAND records (ie 100K+), you should never be able to do it faster in the code than in SQL, especially in MySQL. I would say you've got another problem there.
    Dave Maxwell - Manage Your Site Team Leader
    My favorite YouTube Video! | Star Wars, Dr Suess Style
    Learn how to be ready for The Forums' Move to Discourse

  9. #9
    SitePoint Enthusiast
    Join Date
    Jul 1999
    Location
    Jacksonville, FL
    Posts
    28
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Dave,

    I do have indexes on the values and I'm not (although my original attempt was) using any functions in the WHERE clause ('cuz when you do, indexes are ignored).

    Any help with the correct sql would be appreciated

    To clarify, here are my exact tables:

    C1 has roughly 10,000 records
    REMOVe has roughly 70,000 records

    C1
    -------
    ID
    EMAIL
    FNAME
    LNAME

    REMOVE
    --------
    ID
    EMAIL

    Here's the query I ran (but ultimately consumed all the memory on my server):
    Code:
    SELECT C1.EMAIL, C1.FNAME, C1.LNAME
    FROM C1
    LEFT OUTER JOIN REMOVE ON C1.EMAIL = REMOVE.EMAIL
    Thanks again!

  10. #10
    Just Blow It bronze trophy
    DaveMaxwell's Avatar
    Join Date
    Nov 1999
    Location
    Mechanicsburg, PA
    Posts
    7,264
    Mentioned
    115 Post(s)
    Tagged
    1 Thread(s)
    Does c1.id = remove.id or are they mutually exclusive? If they are equal, use that because numeric comparisons will run much faster than text indexes.

    You could also try rebuilding your indexes on the email field and see if that helps some.
    Dave Maxwell - Manage Your Site Team Leader
    My favorite YouTube Video! | Star Wars, Dr Suess Style
    Learn how to be ready for The Forums' Move to Discourse


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •