I have 4 tables within the same database. All of them have a field called ‘email’ in them. How do i go about doing a Query which will Delete the row of data if the same ‘email’ is found in another Table leaving just the 1 record? I dont know if this makes it any easier for you but each row of data has an ID number (note it wont be the same ID number in the other tables for the same ‘email’ row, its just an Auto Increment i added before importing the data).
Another thing i need to make you aware of is that theres nearly 10 million records between the 4 tables. Is there anything i should do to speed the process up?
Can you provide more background? What are these 4 tables and what is their relationship to each other? Do you want to maintain the email address in one of the 4 tables (so really we’d just be removing it from the other 3) and if so, which one should maintain the unique email address?
Table: BB11UKCoreg
1 email text
2 firstname text
3 lastname text
4 ip text
5 optindate text
6 source text
7 ID int(11) AUTO_INCREMENT
Table: Cinamuse
1 email text
2 ip text
3 url text
4 timestamp text
5 ID int(11) AUTO_INCREMENT
Table: ElitemateJS
1 firstname text
2 lastname text
3 address1 text
4 town text
5 email text
6 source text
7 ip text
8 ID int(11) AUTO_INCREMENT
Table: JSRevG
1 firstname text
2 lastname text
3 address1 text
4 address2 text
5 address3 text
6 address4 text
7 address5 text
8 postcode text
9 email text
10 ip text
11 appdate text
12 createdate text
13 sourceurl text
14 ID int(11) AUTO_INCREMENT
What i need to do is this:
Mysql reads the data in DB1
Mysql then reads DB2 to see if any ‘email’ is the same as whats in DB1
If mysql see a match the row is deleted from DB2
Mysql then reads DB3 to see if any ‘email’ is the same as whats in DB1
If mysql see a match the row is deleted from DB3
Mysql then reads DB4 to see if any ‘email’ is the same as whats in DB1
If mysql see a match the row is deleted from DB4
Mysql reads the data in DB2
Mysql then reads DB1 to see if any ‘email’ is the same as whats in DB2
If mysql see a match the row is deleted from DB1
Mysql then reads DB3 to see if any ‘email’ is the same as whats in DB2
If mysql see a match the row is deleted from DB3
Mysql then reads DB4 to see if any ‘email’ is the same as whats in DB2
If mysql see a match the row is deleted from DB4
Wouldn’t DB2 only need to compare itself to DB3 and DB4, as its’ comparison to DB1 is already done? Then DB3 would only need to compare itself to DB4. So you’d have:
DB1 compared to DB2
DB1 compared to DB3
DB1 compared to DB4
DB2 compared to DB3
DB2 compared to DB4
DB3 compared to DB4
Ah right, yeah i guess so, sorry my brain isnt 100% active today… Would you be able to help me out please. Take a look below, would this work?
DELETE FROM DB2 WHERE email IN (SELECT email FROM DB1);
DELETE FROM DB3 WHERE email IN (SELECT email FROM DB1);
DELETE FROM DB4 WHERE email IN (SELECT email FROM DB1);
DELETE FROM DB1 WHERE email IN (SELECT email FROM DB2);
DELETE FROM DB3 WHERE email IN (SELECT email FROM DB2);
DELETE FROM DB4 WHERE email IN (SELECT email FROM DB2);
DELETE FROM DB1 WHERE email IN (SELECT email FROM DB3);
DELETE FROM DB2 WHERE email IN (SELECT email FROM DB3);
DELETE FROM DB4 WHERE email IN (SELECT email FROM DB3);
DELETE FROM DB1 WHERE email IN (SELECT email FROM DB4);
DELETE FROM DB2 WHERE email IN (SELECT email FROM DB4);
DELETE FROM DB3 WHERE email IN (SELECT email FROM DB4);
Also should i do anything to each table since theres millions of records to make it process the query quicker?
Yes, that looks right to me although I’ve striked the redundant checks (check the quoted code).
As for making it run quicker, index the email column across all 4 tables (let the indexes get built, then run it). It will definitely take a while, but there isn’t much you can do about that.
I need some help, iv indexed all the email fields in all 4 of the tables and im running one of the queries to delete any dups that are found in the other table and so far its ran for over 24hrs. Is there any other way of making it faster?
Are you running the queries via something like phpMyAdmin?
Have you thought about having PHP interact with the database, remove all duplicates from table 1 then from table one get all the email address and for each one run separate DELETE queries against each table in turn to remove the duplicates.
It’d probably just a similar time done via php but might have a lot less of a chance of timing out. Looking at the tables, what’s the relationship between them?