SitePoint Sponsor

User Tag List

Results 1 to 12 of 12

Thread: help with SQL

  1. #1
    SitePoint Zealot
    Join Date
    Oct 2009
    Location
    Oklahoma, USA
    Posts
    103
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    help with SQL

    I need some help with writing an SQL query.

    I have 2 tables with some duplicate data that I would like to associate with each other.
    Table1 - ID-Group1
    ID - primary key
    FirstName
    LastName
    Address
    City
    State
    Zip


    Table2 - IDGroup2
    ID - primary key
    FirstName
    LastName
    Address
    City
    State
    Zip
    Table1ID



    There are a large number of duplicate Firstname/lastName/add/city/state/zip in Table1 and Table2, but they have different ID's.

    Is there a way to write a query to look for matches between the tables and then update Table1ID on the appropriate Table2 record?

    thanks

  2. #2
    SitePoint Zealot
    Join Date
    Oct 2009
    Posts
    134
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I'm very new to all this but I think you need a JOIN query.

  3. #3
    From space with love silver trophy
    SpacePhoenix's Avatar
    Join Date
    May 2007
    Location
    Poole, UK
    Posts
    5,014
    Mentioned
    103 Post(s)
    Tagged
    0 Thread(s)
    How many groups can a person be a member of?

    Does any person have more than one (different) addresses to their name?
    Community Team Advisor
    Forum Guidelines: Posting FAQ Signatures FAQ Self Promotion FAQ
    Help the Mods: What's Fluff? Report Fluff/Spam to a Moderator

  4. #4
    SitePoint Zealot
    Join Date
    Oct 2009
    Location
    Oklahoma, USA
    Posts
    103
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    It has been years of bad information management, but ultimately I have to sort through it.

    There are lots of what-if scenarios here that will cause problems - misspelled/alternate spelling of name/address, duplicate records in Table1 and/or Table2, same person with two different addresses.

    I am debating setting up a seperate table just to manage the duplicates where I could associate one ID with multiple other IDs, but I am not sure the best way to do that either.

  5. #5
    From space with love silver trophy
    SpacePhoenix's Avatar
    Join Date
    May 2007
    Location
    Poole, UK
    Posts
    5,014
    Mentioned
    103 Post(s)
    Tagged
    0 Thread(s)
    Possibly one table for addresses, maybe using a lookup table between the address table and the table storing a person's name. Does any address have more than one person against it?

    Is a person allowed to be a member of more than one group?
    Community Team Advisor
    Forum Guidelines: Posting FAQ Signatures FAQ Self Promotion FAQ
    Help the Mods: What's Fluff? Report Fluff/Spam to a Moderator

  6. #6
    SitePoint Zealot
    Join Date
    Oct 2009
    Location
    Oklahoma, USA
    Posts
    103
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    yes, there are husband/wife and/or multiple residents of the same address in our data.

    Group1 and Group2 has thousands of duplicates. That is what I am trying to indentify & clean up

  7. #7
    From space with love silver trophy
    SpacePhoenix's Avatar
    Join Date
    May 2007
    Location
    Poole, UK
    Posts
    5,014
    Mentioned
    103 Post(s)
    Tagged
    0 Thread(s)
    Can I suggest:

    Group - GroupMembership - Member - MemberAdress - Address

    Group: Records details about each group
    Member: Records details about members
    GroupMembership: Two foreign key fields, one with group id and one with member id
    Address: Records a single address
    MemberAddress: Two foreign key fields, one with member id and one with address id
    Community Team Advisor
    Forum Guidelines: Posting FAQ Signatures FAQ Self Promotion FAQ
    Help the Mods: What's Fluff? Report Fluff/Spam to a Moderator

  8. #8
    SitePoint Zealot
    Join Date
    Oct 2009
    Location
    Oklahoma, USA
    Posts
    103
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    phoenix, I think I follow what you are saying. I assume you are saying to merge the two tables into these five tables. and I understand the two foreign key fields.

    But I dont know how to convert the data in order to programmatically ID the duplicates

  9. #9
    From space with love silver trophy
    SpacePhoenix's Avatar
    Join Date
    May 2007
    Location
    Poole, UK
    Posts
    5,014
    Mentioned
    103 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by dkchapuis View Post
    It has been years of bad information management, but ultimately I have to sort through it.

    There are lots of what-if scenarios here that will cause problems - misspelled/alternate spelling of name/address, duplicate records in Table1 and/or Table2, same person with two different addresses.

    I am debating setting up a seperate table just to manage the duplicates where I could associate one ID with multiple other IDs, but I am not sure the best way to do that either.
    In post #4 you say that there are lots of misspelled/alternate spelling of name/address in both Table One and Table Two I suggest that you first identify records with spelling mistakes and correct them, then where there are alternate spellings, choose one.
    Community Team Advisor
    Forum Guidelines: Posting FAQ Signatures FAQ Self Promotion FAQ
    Help the Mods: What's Fluff? Report Fluff/Spam to a Moderator

  10. #10
    SitePoint Zealot
    Join Date
    Oct 2009
    Location
    Oklahoma, USA
    Posts
    103
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by SpacePhoenix View Post
    In post #4 you say that there are lots of misspelled/alternate spelling of name/address in both Table One and Table Two I suggest that you first identify records with spelling mistakes and correct them, then where there are alternate spellings, choose one.
    That needs to be done, but that is about a month-long data entry job.

    In the mean time, any help on how I could write some SQL to tag the ones I can programmatically would be great.

  11. #11
    SitePoint Wizard bronze trophy
    Join Date
    Jul 2006
    Location
    Augusta, Georgia, United States
    Posts
    4,147
    Mentioned
    16 Post(s)
    Tagged
    3 Thread(s)
    Is the combination of first name and last name what you will be using to identify duplicate user entities? The obvious problem with this approach is that two people with the same first and last name will be combined into a single person. Not ideal but is this something you're willing to except considering there isn't really any other way to differentiate separate users with the same first and last name? You could maybe lag and isolate users with the same first name and last that appear more then x times in the table. For example, if John Smith appears 10 times its most likely the case there are separate users with the same name?

  12. #12
    SitePoint Zealot
    Join Date
    Oct 2009
    Location
    Oklahoma, USA
    Posts
    103
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    every field would have to match - fistname, lastname, address, city, state, zip

    I


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •