SitePoint Sponsor

User Tag List

Results 1 to 12 of 12

Thread: Duplicate Issue

  1. #1
    SitePoint Wizard tgavin's Avatar
    Join Date
    Feb 2003
    Location
    FL
    Posts
    1,051
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Duplicate Issue

    Im not exactly sure how, but I've ended up with duplicates.

    In the current table, you'll see that I used chapt_id as the PK. This allowed for duplicate chapters in some movies.

    Code:
    CREATE TABLE IF NOT EXISTS `chapters` (
      `chapt_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
      `mov_id` bigint(20) unsigned NOT NULL DEFAULT '0',
      `chapter_num` varchar(3) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
      `chapter_title` varchar(255) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
      PRIMARY KEY (`chapt_id`),
      KEY `mov_id` (`mov_id`)
    ) ENGINE=MyISAM  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci ;
    You can see here, if I export from the chapters table

    Code:
    INSERT INTO `chapters` VALUES
    (1996, 65, '1', 'Main Titles'),
    (1997, 65, '2', 'The Prisoners Arrive'),
    (1998, 65, '3', 'Escape Attempt 1'),
    (1999, 65, '4', 'The Blind Spot'),
    (2000, 65, '5', 'To the Cooler'),
    (2001, 65, '6', 'Bartlett''s Arrival'),
    (2002, 65, '7', 'The Objective'),
    (2003, 65, '8', 'The Escape Committee'),
    (2004, 65, '9', 'Forger & Scrounger'),
    (2005, 65, '10', 'Stealing Steel'),
    (322864, 65, '1', 'Main Titles'),
    (322865, 65, '2', 'The Prisoners Arrive'),
    (322866, 65, '3', 'Escape Attempt 1'),
    (322867, 65, '4', 'The Blind Spot'),
    (322868, 65, '5', 'To the Cooler'),
    (322869, 65, '6', 'Bartlett''s Arrival'),
    (322870, 65, '7', 'The Objective'),
    (322871, 65, '8', 'The Escape Committee'),
    (322872, 65, '9', 'Forger & Scrounger'),
    (322873, 65, '10', 'Stealing Steel');
    I'm trying to take the data from the current "chapters" table and INSERT it into the new "movies_chapters" table. Obviously, looking at the PKs below you can see that I'm running into duplicate errors.

    Code:
    CREATE TABLE IF NOT EXISTS `movies_chapters` (
      `mc_mov_id` int(10) unsigned NOT NULL DEFAULT '0',
      `mc_lang` char(2) CHARACTER SET latin1 NOT NULL,
      `mc_chapter_num` varchar(3) CHARACTER SET latin1 NOT NULL,
      `mc_chapter_title` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
      PRIMARY KEY (`mc_mov_id`,`mc_chapter_num`)
    ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
    I'll be using the mov_id and the chapter_num as the new PK, so I'm not worried about keeping the chapt_id anymore. What's the best way to handle this? Delete the duplicates from the chapters table? If so, how?

    Thanks!

  2. #2
    From space with love silver trophy
    SpacePhoenix's Avatar
    Join Date
    May 2007
    Location
    Poole, UK
    Posts
    4,998
    Mentioned
    100 Post(s)
    Tagged
    0 Thread(s)
    Do any other tables in the database use the PK from the chapters table is their FK?
    Community Team Advisor
    Forum Guidelines: Posting FAQ Signatures FAQ Self Promotion FAQ
    Help the Mods: What's Fluff? Report Fluff/Spam to a Moderator

  3. #3
    SitePoint Wizard tgavin's Avatar
    Join Date
    Feb 2003
    Location
    FL
    Posts
    1,051
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Nope.

  4. #4
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,216
    Mentioned
    58 Post(s)
    Tagged
    3 Thread(s)
    see if you can spot the difference in these two solutions --
    Code:
    INSERT 
      INTO movies_chapters
         ( mc_mov_id
         , mc_lang
         , mc_chapter_num
         , mc_chapter_title )
    SELECT DISTINCT
           mov_id
         , 'en'
         , chapter_num
         , chapter_title  
      FROM chapters
    Code:
    INSERT IGNORE
      INTO movies_chapters
         ( mc_mov_id
         , mc_lang
         , mc_chapter_num
         , mc_chapter_title )
    SELECT mov_id
         , 'en'
         , chapter_num
         , chapter_title  
      FROM chapters
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  5. #5
    SitePoint Wizard tgavin's Avatar
    Join Date
    Feb 2003
    Location
    FL
    Posts
    1,051
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The syntax is different, but I'm going to say those do the same thing. Does INSERT INGORE use less system resources?

  6. #6
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,216
    Mentioned
    58 Post(s)
    Tagged
    3 Thread(s)
    no, INSERT IGNORE attempts to insert all rows, ignoring dupes, whereas the INSERT which has SELECT DISTINCT inserts only the uniques

    system resources is of minor concern here, since you are presumably doing this only once
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  7. #7
    From space with love silver trophy
    SpacePhoenix's Avatar
    Join Date
    May 2007
    Location
    Poole, UK
    Posts
    4,998
    Mentioned
    100 Post(s)
    Tagged
    0 Thread(s)
    @Rudy, for the benefit of anyone who comes across this thread what performance difference, if any is there between the two solutions?
    Community Team Advisor
    Forum Guidelines: Posting FAQ Signatures FAQ Self Promotion FAQ
    Help the Mods: What's Fluff? Report Fluff/Spam to a Moderator

  8. #8
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,216
    Mentioned
    58 Post(s)
    Tagged
    3 Thread(s)
    i would guess that the one with DISTINCT might be slower, because it has to sort the entire table on all columns first, whereas the IGNORE only has to do an index lookup on each row

    who do you suppose has some sample data in the requisite two table layouts and might be available to test it for us?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  9. #9
    SitePoint Wizard tgavin's Avatar
    Join Date
    Feb 2003
    Location
    FL
    Posts
    1,051
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I tried doing both and received this error performing the DUPLICATE query: #2006 - MySQL server has gone away. The chapters table has over 600,000 rows.

    Rudy, I understand that the actions are different, but isn't the outcome still the same? Which method do you recommend?

  10. #10
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,216
    Mentioned
    58 Post(s)
    Tagged
    3 Thread(s)
    i would not agonize over which method to use

    if one of them failed on me, i'd use the other one in a flash
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  11. #11
    SitePoint Wizard tgavin's Avatar
    Join Date
    Feb 2003
    Location
    FL
    Posts
    1,051
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Excellent, so they DO produce the same result.

    I'm working on this on my local Mac, not a web server. It's a Mac Pro 8 Core with 6 GB RAM. My query results are

    Using INSERT IGNORE
    Inserted rows: 579653 (Query took 47.4497 sec)

    Using DUPLICATE
    #2006 - MySQL server has gone away.

    I tried this three times and got the same result. Use INSERT IGNORE.

    Thanks Rudy!

  12. #12
    SitePoint Wizard bronze trophy
    Join Date
    Jul 2008
    Posts
    5,757
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by tgavin View Post
    Excellent, so they DO produce the same result.
    Not necessarily. You might have multiple records with the same mov_id and chapter_num, but have different titles. Distinct sees those as distinct rows. An error would be thrown when you try to insert them because your new table only has a unique constraint on mov_id and chapter_num.

    This probably isn't the case in your dataset though.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •