Eliminating Duplicate Records

[FONT=“Georgia”]Hello.

I have a huge database with several duplicate entries.

What is the simplist query I can write to select and delete the duplicates, leaving behind only one single record?

I’m guessing I’ll need to use a GROUP BY somewhere in there, but I’m honestly baffled about the rest.

Making things trickier is that there are actually two fields I’ll need to look at in order to determine if a record is truly a duplicate or not.

Any suggestions?

[/FONT]

[FONT=“Georgia”]Just to elabourate; Each record does have a unique ID. But the rest of the data is what’s dupicate in some cases.

Example below;[/FONT]

_______________________________
| id | title       | country  |
===============================
|  1 | abrasives   | trinidad |
[COLOR="Red"]|  2 | abrasives   | trinidad |
|  3 | abrasives   | trinidad |[/COLOR]
|  4 | abrasives   | tobago   |
|  5 | accountants | trinidad |
|  6 | accountants | tobago   |
[COLOR="Red"]|  7 | accountants | tobago   |
|  8 | accountants | tobago   |
|  9 | accountants | tobago   |[/COLOR]
| 10 | actuaries   | trinidad |
[COLOR="Red"]| 11 | actuaries   | trinidad |[/COLOR]
| 12 | actuaries   | tobago   |
| 13 | agriculture | trinidad |
| 14 | agriculture | tobago   |
| etc...

[FONT=“Georgia”]Highlighted in red are what I’m trying to get rid of. Redundant data.

[/FONT]

three steps:

CREATE TABLE my_temp
SELECT MIN(id) AS id, title, country FROM my_table
GROUP BY title, country

verify the results, then

TRUNCATE TABLE my_table
INSERT INTO my_table SELECT * FROM my_temp

then, to prevent the dupes from ever happening again,

ALTER TABLE my_table
ADD UNIQUE (title,country)

I do this:


RENAME TABLE foo TO tmp;
CREATE TABLE foo LIKE tmp SELECT DISTINCT null,  title, country FROM tmp;
DROP TABLE tmp;

Just be careful, this will not keep the same IDs, but will be faster than the GROUP solution.

If you want to keep the IDs, I would do this:


RENAME TABLE foo TO tmp;
CREATE TABLE foo LIKE tmp;
ALTER TABLE foo ADD UNIQUE(title,country);
INSERT IGNORE INTO foo SELECT * FROM tmp ORDER BY id ASC;
DROP TABLE tmp;

But it’s slower.

[FONT=“Georgia”]Thanks Vali, but actually I was busy working on Rudy’s suggestion while you were typing yours.

I did a combination of Rudy’s query with PHPMyAdmin to create and trancate the tables.

I’m trying to figure out the last part though;[/FONT]

ALTER TABLE my_table
ADD UNIQUE (title,country)

[FONT=“Georgia”]What does that query do?

I googled it and it said something about adding a column to the table?

[/FONT]

It creates a unique constraint on the columns title and country, meaning that each combination of these two fields can only exist in the table once :slight_smile:

[FONT=“Georgia”]ah, Okay.

So how would that fit into my existing INSERT query?[/FONT]

INSERT INTO my_table VALUES ('','$title','$country','$other-values');

[FONT=“Georgia”]Or is it a query I’d run seperately, just after running INSERT?

[/FONT]

IN phpMyAdmin, go tio the mysl query window/box.

type in rudys alter table statement, making sure the name name of the table matches that of your db table. et voila :slight_smile: The unique constraint will have been added and you won’t get any duplicates any more, where those two columns are concerned.

bazz

I got an error message;

#1071 - Specified key was too long; max key length is 1000 bytes

_

could you do a SHOW CREATE TABLE please

#1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ‘SHOW CREATE TABLE’ at line 4

_

You’ll need to follow those three words with the name of the table :slight_smile:

[FONT=“Georgia”]Ore.

Okay it’s;[/FONT]

CREATE TABLE `yellow_categories` (
 `categories_id` int(5) NOT NULL DEFAULT '0',
 `categories_title` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `categories_unique` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `categories_country` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `categories_letter` varchar(1) COLLATE utf8_unicode_ci NOT NULL,
 `categories_banner_id` int(5) NOT NULL,
 `categories_status` int(1) NOT NULL,
 `categories_log` text COLLATE utf8_unicode_ci NOT NULL,
 `categories_01` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `categories_02` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `categories_03` varchar(255) COLLATE utf8_unicode_ci NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

something doesn’t add up

the ALTER to add a UNIQUE index on (title,country) gave you the error message " max key length is 1000 bytes "

however, your SHOW CREATE TABLE shows them as VARCHAR(255) which should only add up to 510

what gives, dude?

[FONT=“Georgia”]hahaha, Man if I knew that, I wouldn’t be here.

[/FONT]

[FONT=“Georgia”]I tried approaching the problem from another angle; At data entry rather than at database settings.

I truncated the whole damn table, then tried changing the input code in my PHP form to INSERT IGNORE.

That failed miserably; Instead of importing the thousands of record in my tab-delimited text file, it imported only one, with no data, and with an id (‘categories_id’) of 0.

And actually, I noticed when I scrapped that, removed the IGNORE, and re-imported the data, ALL of the id’s were 0!

It’s actually not a problem, because I’m referencing the records by ‘categories_unique’, but it’s puzzling.

I don’t know if both of those errors are related.

[/FONT]

if you don’t need an auto_increment column, then for pity’s sake don’t declare one

[FONT=“Georgia”]Or else what?

It was being used at first, but the plan changed during building. The ‘categories_id’ column’s only there now because it would be more hassle to remove than keep it in.

[/FONT]

which tables are related to this yellow categories table? (and why is it yellow?)

[FONT=“Georgia”]I’m still getting that “#1071 - Specified key was too long; max key length is 1000 bytes” error.

I made a file to clean redundant data after input, so I’ll just manually run that after data uploads (or redirect the input form to it).

In my next big database project I’ll spend sometime figuring out the ADD UNIQUE tools before beginning anything. This project isn’t paid well enough to fret over it now.

Thanks for all the help Rudy and everyone!

[/FONT]