Im not exactly sure how, but I've ended up with duplicates.
In the current table, you'll see that I used chapt_id as the PK. This allowed for duplicate chapters in some movies.
You can see here, if I export from the chapters table
CREATE TABLE IF NOT EXISTS `chapters` (
`chapt_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`mov_id` bigint(20) unsigned NOT NULL DEFAULT '0',
`chapter_num` varchar(3) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
`chapter_title` varchar(255) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
PRIMARY KEY (`chapt_id`),
KEY `mov_id` (`mov_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci ;
I'm trying to take the data from the current "chapters" table and INSERT it into the new "movies_chapters" table. Obviously, looking at the PKs below you can see that I'm running into duplicate errors.
INSERT INTO `chapters` VALUES
(1996, 65, '1', 'Main Titles'),
(1997, 65, '2', 'The Prisoners Arrive'),
(1998, 65, '3', 'Escape Attempt 1'),
(1999, 65, '4', 'The Blind Spot'),
(2000, 65, '5', 'To the Cooler'),
(2001, 65, '6', 'Bartlett''s Arrival'),
(2002, 65, '7', 'The Objective'),
(2003, 65, '8', 'The Escape Committee'),
(2004, 65, '9', 'Forger & Scrounger'),
(2005, 65, '10', 'Stealing Steel'),
(322864, 65, '1', 'Main Titles'),
(322865, 65, '2', 'The Prisoners Arrive'),
(322866, 65, '3', 'Escape Attempt 1'),
(322867, 65, '4', 'The Blind Spot'),
(322868, 65, '5', 'To the Cooler'),
(322869, 65, '6', 'Bartlett''s Arrival'),
(322870, 65, '7', 'The Objective'),
(322871, 65, '8', 'The Escape Committee'),
(322872, 65, '9', 'Forger & Scrounger'),
(322873, 65, '10', 'Stealing Steel');
I'll be using the mov_id and the chapter_num as the new PK, so I'm not worried about keeping the chapt_id anymore. What's the best way to handle this? Delete the duplicates from the chapters table? If so, how?
CREATE TABLE IF NOT EXISTS `movies_chapters` (
`mc_mov_id` int(10) unsigned NOT NULL DEFAULT '0',
`mc_lang` char(2) CHARACTER SET latin1 NOT NULL,
`mc_chapter_num` varchar(3) CHARACTER SET latin1 NOT NULL,
`mc_chapter_title` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`mc_mov_id`,`mc_chapter_num`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Do any other tables in the database use the PK from the chapters table is their FK?
see if you can spot the difference in these two solutions --
, mc_chapter_title )
, mc_chapter_title )
The syntax is different, but I'm going to say those do the same thing. Does INSERT INGORE use less system resources?
no, INSERT IGNORE attempts to insert all rows, ignoring dupes, whereas the INSERT which has SELECT DISTINCT inserts only the uniques
system resources is of minor concern here, since you are presumably doing this only once
@Rudy, for the benefit of anyone who comes across this thread what performance difference, if any is there between the two solutions?
i would guess that the one with DISTINCT might be slower, because it has to sort the entire table on all columns first, whereas the IGNORE only has to do an index lookup on each row
who do you suppose has some sample data in the requisite two table layouts and might be available to test it for us?
I tried doing both and received this error performing the DUPLICATE query: #2006 - MySQL server has gone away. The chapters table has over 600,000 rows.
Rudy, I understand that the actions are different, but isn't the outcome still the same? Which method do you recommend?
i would not agonize over which method to use
if one of them failed on me, i'd use the other one in a flash
Excellent, so they DO produce the same result.
I'm working on this on my local Mac, not a web server. It's a Mac Pro 8 Core with 6 GB RAM. My query results are
Using INSERT IGNORE
Inserted rows: 579653 (Query took 47.4497 sec)
#2006 - MySQL server has gone away.
I tried this three times and got the same result. Use INSERT IGNORE. :)
Not necessarily. You might have multiple records with the same mov_id and chapter_num, but have different titles. Distinct sees those as distinct rows. An error would be thrown when you try to insert them because your new table only has a unique constraint on mov_id and chapter_num.
Originally Posted by tgavin
This probably isn't the case in your dataset though.