SitePoint Sponsor

User Tag List

Results 1 to 8 of 8
  1. #1
    Foozle Reducer ServerStorm's Avatar
    Join Date
    Feb 2005
    Location
    Burlington, Canada
    Posts
    2,699
    Mentioned
    89 Post(s)
    Tagged
    6 Thread(s)

    return duplicates

    Hi,

    I have searched the form for duplicates and while there were many questions asked I am not sure what is the best way to do this.

    If I am searching users by a registrant_id, first_name, last_name, prov_state, and postal_code where any three of these fields that match in multiple records are considered duplicates but the registrant_id is unique for each member then do you know the best way to return all of these duplicates?

    Many thanks,
    Steve
    ictus==""

  2. #2
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,327
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    there are four ways of choosing three of four

    so you will need four different queries, which, luckily, you can UNION

    here's one of them --
    Code:
    SELECT t.*
      FROM ( SELECT last_name
                  , prov_state
                  , postal_code 
               FROM daTable
             GROUP
                 BY last_name
                  , prov_state
                  , postal_code 
             HAVING COUNT(*) > 1 ) AS dupe1
    INNER
      JOIN daTable AS t
        ON t.last_name   = dupe1.last_name                  
           t.prov_state  = dupe1.prov_state       
           t.postal_code = dupe1.postal_code
    having said that, i wonder if you truly are interested in the results of this query -- it will return all people named Smith in Oakville
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  3. #3
    Foozle Reducer ServerStorm's Avatar
    Join Date
    Feb 2005
    Location
    Burlington, Canada
    Posts
    2,699
    Mentioned
    89 Post(s)
    Tagged
    6 Thread(s)
    Thanks Rudy,

    As I am not deleting these, I guess I could also just do a normal SELECT too ?
    Code:
    SELECT
    	l.lead_id as 'Registrant Number'
    	, l.first_name as 'First Name'
    	, l.last_name as 'Last Name'
        	, pst.name as 'State/Province'
        	, p.postal_code as 'Zip/Postal Code'
        FROM
        	leads as l
        LEFT OUTER 
        	JOIN leads2addresses as l2a
        		ON l2a.lead_id = l.lead_id
        LEFT OUTER 
        	JOIN addresses as a
        		ON a.address_id = l2a.address_id
        LEFT OUTER 
        	JOIN postal_codes as p
        		ON p.postal_code = a.post_code 
        LEFT OUTER 
        	JOIN provs_states as pst
        		ON pst.abbreviation = p.state_prov_abbr
        		AND p.postal_code = a.post_code
       WHERE
    	l.first_name Like '%Bill%'
       AND
           l.last_name Like '%Smi%'
       AND
          p.postal_code Like '%L7L%'
    Returns:
    Registrant Number First Name Last Name State/Province Zip/Postal Code
    2 Bill Smith Ontario L7L6M5
    4 Bill Smith Ontario L7L6M5

    Steve
    ictus==""

  4. #4
    Foozle Reducer ServerStorm's Avatar
    Join Date
    Feb 2005
    Location
    Burlington, Canada
    Posts
    2,699
    Mentioned
    89 Post(s)
    Tagged
    6 Thread(s)
    Quote Originally Posted by r937 View Post
    there are four ways of choosing three of four

    so you will need four different queries, which, luckily, you can UNION

    here's one of them --
    Code:
    SELECT t.*
      FROM ( SELECT last_name
                  , prov_state
                  , postal_code 
               FROM daTable
             GROUP
                 BY last_name
                  , prov_state
                  , postal_code 
             HAVING COUNT(*) > 1 ) AS dupe1
    INNER
      JOIN daTable AS t
        ON t.last_name   = dupe1.last_name                  
           t.prov_state  = dupe1.prov_state       
           t.postal_code = dupe1.postal_code
    having said that, i wonder if you truly are interested in the results of this query -- it will return all people named Smith in Oakville
    The results of this query will allow the user to click to view the full profile of each lead which has much more info, so your query would be fine.

    Regards,
    Steve
    ictus==""

  5. #5
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,327
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by ServerStorm View Post
    As I am not deleting these, I guess I could also just do a normal SELECT too ?
    okay, where did the idea of deleting come from?

    and what does this query have to do with finding duplicates based on three of four columns?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  6. #6
    Foozle Reducer ServerStorm's Avatar
    Join Date
    Feb 2005
    Location
    Burlington, Canada
    Posts
    2,699
    Mentioned
    89 Post(s)
    Tagged
    6 Thread(s)
    Quote Originally Posted by ServerStorm View Post
    Thanks Rudy,

    As I am not deleting these, I guess I could also just do a normal SELECT too ?
    Just was saying that your supplied method will work as it does the correct job of selecting (and not deleting) the duplicates... sorry for the lack of clarity.
    and what does this query have to do with finding duplicates based on three of four columns?
    My client wants to view all records where any three columns that match out of Registrant Number, First Name, Last Name, State/Province, or Zip/Postal Code will be shown as a possible duplicate.

    So if in a HTML form they fill in three of the possible fields for matching and duplicates are returned then they can manually choose to delete them.
    ictus==""

  7. #7
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,327
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    may i ask why you have to join the address table to the postal code table? what columns do you have in the postal codes table?

    also, same question for the states/provinces table -- why the join here?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  8. #8
    Foozle Reducer ServerStorm's Avatar
    Join Date
    Feb 2005
    Location
    Burlington, Canada
    Posts
    2,699
    Mentioned
    89 Post(s)
    Tagged
    6 Thread(s)
    Quote Originally Posted by r937 View Post
    may i ask why you have to join the address table to the postal code table? what columns do you have in the postal codes table?

    also, same question for the states/provinces table -- why the join here?
    Addresses:
    Code:
    Table addresses
    ===============
    address_id, intended_use, street, apt_unit_number, street_direction, post_code, region
    ---------------
    address_id       int(11) PK
    intended_use     varchar(150)
    street           varchar(150)
    apt_unit_number  varchar(10)
    street_direction varchar(2)
    post_code        varchar(20)
    region           varchar(150)
    Postal Codes:
    Code:
    Table postal_codes==================
    postal_code, city_name, state_prov_abbr, country_iso_code
    ------------------
    postal_code      varchar(20) PK
    city_name        varchar(64)
    state_prov_abbr  char(2)
    country_iso_code char(2)
    Province and States:
    Code:
    Table provs_states
    ==================
    id, country_id, name, abbreviation
    ------------------
    id               smallint(5) unsigned PK
    country_id       char(2)
    name             varchar(100)
    abbreviation     varchar(3)
    Countries:
    Code:
    Table countries===============
    country_iso_code, country_name
    ---------------
    country_iso_code char(2) PK
    country_name     varchar(150)
    This is setup this way as a convenience for the client, to find a persons full address they only need to supply the person's data and their postal code; the joins provide all the other address type data. Yes it is not as efficient as using it in one table, but their application is 'heavy' on the requirement to search for people's addresses so this extra complication (in code) and slower efficiency is warranted.
    ictus==""


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •