SitePoint Sponsor

User Tag List

Results 1 to 11 of 11
  1. #1
    SitePoint Addict
    Join Date
    Mar 2011
    Location
    Manchester, UK
    Posts
    227
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Delete Duplicates From All Tables?

    Hi Guys,

    I have 4 tables within the same database. All of them have a field called 'email' in them. How do i go about doing a Query which will Delete the row of data if the same 'email' is found in another Table leaving just the 1 record? I dont know if this makes it any easier for you but each row of data has an ID number (note it wont be the same ID number in the other tables for the same 'email' row, its just an Auto Increment i added before importing the data).

    Another thing i need to make you aware of is that theres nearly 10 million records between the 4 tables. Is there anything i should do to speed the process up?

    If anyone could help me that would be great.

    Thanks,
    Mike
    You're Help Does Not Go Unnoticed, I have So Far Donated 25 GBP
    To Cancer Research UK As A Thank You To All The SitePoint
    Members That Have Helped Me In The PHP Forum Thank You!

  2. #2
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,223
    Mentioned
    153 Post(s)
    Tagged
    0 Thread(s)
    Can you provide more background? What are these 4 tables and what is their relationship to each other? Do you want to maintain the email address in one of the 4 tables (so really we'd just be removing it from the other 3) and if so, which one should maintain the unique email address?

    Can you provide a generic example, such as:

    table_name
    1 myemail@mydomain.com **KEEP**
    2 myotheremail@mydomain.com

    table_name2
    4 myemail@mydomain.com

    table_name3
    16 myemail@mydomain.com

    etc.

  3. #3
    SitePoint Addict
    Join Date
    Mar 2011
    Location
    Manchester, UK
    Posts
    227
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by cpradio View Post
    Can you provide more background? What are these 4 tables and what is their relationship to each other? Do you want to maintain the email address in one of the 4 tables (so really we'd just be removing it from the other 3) and if so, which one should maintain the unique email address?

    Can you provide a generic example, such as:

    table_name
    1 myemail@mydomain.com **KEEP**
    2 myotheremail@mydomain.com

    table_name2
    4 myemail@mydomain.com

    table_name3
    16 myemail@mydomain.com

    etc.
    Below are the table structures:

    Code:
    Table: BB11UKCoreg
    
    1 email text
    2 firstname text
    3 lastname text
    4 ip text 
    5 optindate text
    6 source text 
    7 ID int(11) AUTO_INCREMENT
    
    Table: Cinamuse
    
    1 email text
    2 ip text
    3 url text 
    4 timestamp text
    5 ID int(11) AUTO_INCREMENT
    
    Table: ElitemateJS
    
    1 firstname text 
    2 lastname text 
    3 address1 text
    4 town text 
    5 email text
    6 source text 
    7 ip text
    8 ID int(11) AUTO_INCREMENT
    
    Table: JSRevG
    
    1 firstname text 
    2 lastname text 
    3 address1 text
    4 address2 text 
    5 address3 text
    6 address4 text 
    7 address5 text 
    8 postcode text
    9 email text
    10 ip text 
    11 appdate text 
    12 createdate text
    13 sourceurl text 
    14 ID int(11) AUTO_INCREMENT
    What i need to do is this:

    Mysql reads the data in DB1
    Mysql then reads DB2 to see if any 'email' is the same as whats in DB1
    If mysql see a match the row is deleted from DB2
    Mysql then reads DB3 to see if any 'email' is the same as whats in DB1
    If mysql see a match the row is deleted from DB3
    Mysql then reads DB4 to see if any 'email' is the same as whats in DB1
    If mysql see a match the row is deleted from DB4


    Mysql reads the data in DB2
    Mysql then reads DB1 to see if any 'email' is the same as whats in DB2
    If mysql see a match the row is deleted from DB1
    Mysql then reads DB3 to see if any 'email' is the same as whats in DB2
    If mysql see a match the row is deleted from DB3
    Mysql then reads DB4 to see if any 'email' is the same as whats in DB2
    If mysql see a match the row is deleted from DB4

    and so on.... does that make sence?

    EDIT: there is no relation between the tables.
    You're Help Does Not Go Unnoticed, I have So Far Donated 25 GBP
    To Cancer Research UK As A Thank You To All The SitePoint
    Members That Have Helped Me In The PHP Forum Thank You!

  4. #4
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,223
    Mentioned
    153 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by AdWarm View Post
    What i need to do is this:

    Mysql reads the data in DB1
    Mysql then reads DB2 to see if any 'email' is the same as whats in DB1
    If mysql see a match the row is deleted from DB2
    Mysql then reads DB3 to see if any 'email' is the same as whats in DB1
    If mysql see a match the row is deleted from DB3
    Mysql then reads DB4 to see if any 'email' is the same as whats in DB1
    If mysql see a match the row is deleted from DB4


    Mysql reads the data in DB2
    Mysql then reads DB1 to see if any 'email' is the same as whats in DB2
    If mysql see a match the row is deleted from DB1
    Mysql then reads DB3 to see if any 'email' is the same as whats in DB2
    If mysql see a match the row is deleted from DB3
    Mysql then reads DB4 to see if any 'email' is the same as whats in DB2
    If mysql see a match the row is deleted from DB4

    and so on.... does that make sence?

    EDIT: there is no relation between the tables.
    Wouldn't DB2 only need to compare itself to DB3 and DB4, as its' comparison to DB1 is already done? Then DB3 would only need to compare itself to DB4. So you'd have:

    DB1 compared to DB2
    DB1 compared to DB3
    DB1 compared to DB4
    DB2 compared to DB3
    DB2 compared to DB4
    DB3 compared to DB4

  5. #5
    SitePoint Addict
    Join Date
    Mar 2011
    Location
    Manchester, UK
    Posts
    227
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by cpradio View Post
    Wouldn't DB2 only need to compare itself to DB3 and DB4, as its' comparison to DB1 is already done? Then DB3 would only need to compare itself to DB4. So you'd have:

    DB1 compared to DB2
    DB1 compared to DB3
    DB1 compared to DB4
    DB2 compared to DB3
    DB2 compared to DB4
    DB3 compared to DB4
    Hi,

    Ah right, yeah i guess so, sorry my brain isnt 100% active today... Would you be able to help me out please. Take a look below, would this work?

    Code:
    DELETE FROM DB2 WHERE email IN (SELECT email FROM DB1);
    DELETE FROM DB3 WHERE email IN (SELECT email FROM DB1);
    DELETE FROM DB4 WHERE email IN (SELECT email FROM DB1);
    
    DELETE FROM DB1 WHERE email IN (SELECT email FROM DB2);
    DELETE FROM DB3 WHERE email IN (SELECT email FROM DB2);
    DELETE FROM DB4 WHERE email IN (SELECT email FROM DB2);
    
    DELETE FROM DB1 WHERE email IN (SELECT email FROM DB3);
    DELETE FROM DB2 WHERE email IN (SELECT email FROM DB3);
    DELETE FROM DB4 WHERE email IN (SELECT email FROM DB3);
    
    DELETE FROM DB1 WHERE email IN (SELECT email FROM DB4);
    DELETE FROM DB2 WHERE email IN (SELECT email FROM DB4);
    DELETE FROM DB3 WHERE email IN (SELECT email FROM DB4);
    Also should i do anything to each table since theres millions of records to make it process the query quicker?

    Thanks
    You're Help Does Not Go Unnoticed, I have So Far Donated 25 GBP
    To Cancer Research UK As A Thank You To All The SitePoint
    Members That Have Helped Me In The PHP Forum Thank You!

  6. #6
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,223
    Mentioned
    153 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by AdWarm View Post
    Hi,

    Ah right, yeah i guess so, sorry my brain isnt 100% active today... Would you be able to help me out please. Take a look below, would this work?

    Code:
    DELETE FROM DB2 WHERE email IN (SELECT email FROM DB1);
    DELETE FROM DB3 WHERE email IN (SELECT email FROM DB1);
    DELETE FROM DB4 WHERE email IN (SELECT email FROM DB1);
    
    DELETE FROM DB1 WHERE email IN (SELECT email FROM DB2);
    DELETE FROM DB3 WHERE email IN (SELECT email FROM DB2);
    DELETE FROM DB4 WHERE email IN (SELECT email FROM DB2);
    
    DELETE FROM DB1 WHERE email IN (SELECT email FROM DB3);
    DELETE FROM DB2 WHERE email IN (SELECT email FROM DB3);
    DELETE FROM DB4 WHERE email IN (SELECT email FROM DB3);
    
    DELETE FROM DB1 WHERE email IN (SELECT email FROM DB4);
    DELETE FROM DB2 WHERE email IN (SELECT email FROM DB4);
    DELETE FROM DB3 WHERE email IN (SELECT email FROM DB4);
    Also should i do anything to each table since theres millions of records to make it process the query quicker?

    Thanks
    Yes, that looks right to me although I've striked the redundant checks (check the quoted code).

    As for making it run quicker, index the email column across all 4 tables (let the indexes get built, then run it). It will definitely take a while, but there isn't much you can do about that.

  7. #7
    SitePoint Addict
    Join Date
    Mar 2011
    Location
    Manchester, UK
    Posts
    227
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by cpradio View Post
    Yes, that looks right to me although I've striked the redundant checks (check the quoted code).

    As for making it run quicker, index the email column across all 4 tables (let the indexes get built, then run it). It will definitely take a while, but there isn't much you can do about that.
    I need some help, iv indexed all the email fields in all 4 of the tables and im running one of the queries to delete any dups that are found in the other table and so far its ran for over 24hrs. Is there any other way of making it faster?
    You're Help Does Not Go Unnoticed, I have So Far Donated 25 GBP
    To Cancer Research UK As A Thank You To All The SitePoint
    Members That Have Helped Me In The PHP Forum Thank You!

  8. #8
    From space with love silver trophy
    SpacePhoenix's Avatar
    Join Date
    May 2007
    Location
    Poole, UK
    Posts
    5,068
    Mentioned
    103 Post(s)
    Tagged
    0 Thread(s)
    How many records are in each table?

    Are you running the queries via something like phpMyAdmin?

    Have you thought about having PHP interact with the database, remove all duplicates from table 1 then from table one get all the email address and for each one run separate DELETE queries against each table in turn to remove the duplicates.
    Community Team Advisor
    Forum Guidelines: Posting FAQ Signatures FAQ Self Promotion FAQ
    Help the Mods: What's Fluff? Report Fluff/Spam to a Moderator

  9. #9
    SitePoint Addict
    Join Date
    Mar 2011
    Location
    Manchester, UK
    Posts
    227
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by SpacePhoenix View Post
    How many records are in each table?

    Are you running the queries via something like phpMyAdmin?

    Have you thought about having PHP interact with the database, remove all duplicates from table 1 then from table one get all the email address and for each one run separate DELETE queries against each table in turn to remove the duplicates.
    Hi,

    Yea im running the query via PHPMYADMIN. I've just had to restart MySQL as it was taking too long.

    DB1 = 5,929,150 records
    DB2 = 786,755 records
    DB3 = 387,964 records
    DB4 = 2,721,587 records

    If i did it via PHP wouldnt that taking a long time to do and then timeout?

    Whats the best way of me doing this, i really need some help

    Thank you...
    You're Help Does Not Go Unnoticed, I have So Far Donated 25 GBP
    To Cancer Research UK As A Thank You To All The SitePoint
    Members That Have Helped Me In The PHP Forum Thank You!

  10. #10
    From space with love silver trophy
    SpacePhoenix's Avatar
    Join Date
    May 2007
    Location
    Poole, UK
    Posts
    5,068
    Mentioned
    103 Post(s)
    Tagged
    0 Thread(s)
    It'd probably just a similar time done via php but might have a lot less of a chance of timing out. Looking at the tables, what's the relationship between them?
    Community Team Advisor
    Forum Guidelines: Posting FAQ Signatures FAQ Self Promotion FAQ
    Help the Mods: What's Fluff? Report Fluff/Spam to a Moderator

  11. #11
    SitePoint Addict
    Join Date
    Mar 2011
    Location
    Manchester, UK
    Posts
    227
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by SpacePhoenix View Post
    It'd probably just a similar time done via php but might have a lot less of a chance of timing out. Looking at the tables, what's the relationship between them?
    Hi,

    Theres no relation to them all.

    Thanks
    You're Help Does Not Go Unnoticed, I have So Far Donated 25 GBP
    To Cancer Research UK As A Thank You To All The SitePoint
    Members That Have Helped Me In The PHP Forum Thank You!


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •