SitePoint Sponsor

User Tag List

Results 1 to 24 of 24
  1. #1
    SitePoint Evangelist
    Join Date
    Jun 2010
    Posts
    455
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)

    removing duplicate rows with MYSQL

    I'd like to delete duplicate rows in a MYSQL table without creating a new table.

    Generally, how is that done?

  2. #2
    SitePoint Zealot GOPalmer's Avatar
    Join Date
    Jan 2009
    Location
    Wiltshire, UK
    Posts
    125
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    How are they duplicated? Can you provide the schema? You won't need to create a new table; DELETE statements should suffice.

    George

  3. #3
    SitePoint Evangelist
    Join Date
    Jun 2010
    Posts
    455
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    How do you use DELETE to leave one of the duplicate rows, but DELETE the rest?

  4. #4
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,341
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    generally, start by defining which columns constitute a "duplicate"

    i put "duplicate" in quotes because in almost every case where someone wants duplicates removed, they also have an auto_increment on the table -- which, by definition, means that there are actually no duplicate rows, just rows with the same values in certain columns or combinations of columns

    once you've decided which columns determine your dupes, look at the values in the other columns, and ask yourself if you care which rows should provide the values for these other columns

    to illustrate this scenario, consider the following three rows in a users table --
    Code:
    userid email           username age
      42   fred@gmail.com  fred     -
       9   fred@gmail.com  -        -
      37   fred@gmail.com  todd     25
    the "duplicates" column is the email, and your objective is to eliminate the dupes and leave only one row with fred@gmail.com

    but look at the other columns -- if you kept row 9, you'd be throwing away some valuable information

    have i given you enough information for your rather general question?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  5. #5
    SitePoint Evangelist
    Join Date
    Jun 2010
    Posts
    455
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    These would be pure dupes except for the auto increment.

    GOPalmer suggests using DELETE, but it's not apparent to me how to use DELETE to remove all the dupes except one.

  6. #6
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,341
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    DELETE is tricky for the following reason...
    Quote Originally Posted by da manual
    Currently, you cannot delete from a table and select from the same table in a subquery.
    what this means is you have to first extract the data you want to keep, save this in another table, and then you can either delete from your main table and insert from this saved table, or, if your extract was comprehensive enough, just drop and rename the tables
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  7. #7
    SitePoint Evangelist
    Join Date
    Jun 2010
    Posts
    455
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    so, there's nothing in the "manual" that let's me order a primary key and remove the second and subsequent records of a duplicated key by default and reatain all rows with unique keys and only the first row with duplicate keys?

  8. #8
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,341
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    yes, there is a technique you can use, if you are sure that the duplicate rows have nothing in them that you care about

    using the ALTER IGNORE TABLE syntax (note: the IGNORE is important), declare a UNIQUE INDEX on the column(s) that determine the dupes

    and vwalah

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  9. #9
    SitePoint Evangelist
    Join Date
    Jun 2010
    Posts
    455
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    as in:

    PHP Code:
    mysql_query("ALTER IGNORE TABLE stk") or die(mysql_error()); 
    ????

    EDIT:

    PHP Code:
    mysql_query("ALTER IGNORE TABLE stk INDEX (pic)") or die(mysql_error()); 
    Last edited by nichemtktg; Jun 23, 2011 at 14:22. Reason: addition

  10. #10
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,341
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    no

    first of all, you wouldn't set up a php program to do this, as you are only going to do it once, and it's a lot easier just typing the sql into a front end app like the mysql query browser or heidisql or phpmyadmin

    but secondly, your ALTER statement doesn't actually alter anything -- you have to ADD UNIQUE INDEX
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  11. #11
    SitePoint Evangelist
    Join Date
    Jun 2010
    Posts
    455
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    I've always used php and haven't manually entered scripts with phpmyadmin before (until now). I just found the edit button to enter scripts.

    I just entered: ALTER TABLE `stk2` ADD INDEX ( `pic` )

    What do I enter now to delete the dupes?

    Obviously, ALTER IGNORE TABLE stk INDEX (pic), failed.

  12. #12
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,341
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by nichemtktg View Post
    What do I enter now to delete the dupes?
    may i suggest that you read my previous posts again, because i mentioned it a couple of times already

    you have to declare a special type of index, that rhymes with you neek

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  13. #13
    SitePoint Evangelist
    Join Date
    Jun 2010
    Posts
    455
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    OK.

    just did: ALTER TABLE `stk2` ADD UNIQUE (`pic`)

    then: ALTER IGNORE TABLE 'stk2' INDEX ('pic')

    which worked, but no deletions.

    I'm dancin, where do my feet need to go?

  14. #14
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,341
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by nichemtktg View Post
    just did: ALTER TABLE `stk2` ADD UNIQUE (`pic`)
    since this statement did not include the IGNORE keyword, and assuming that your table in fact did have duplicate `pic` values, the only possible result is that this statement failed, and you just didn't notice it
    Quote Originally Posted by nichemtktg View Post
    then: ALTER IGNORE TABLE 'stk2' INDEX ('pic')
    this just adds another non-unique index (whether or not you removed the one you created earlier) and so it would've completed, but since it didn't specify UNIQUE, you still have the dupes

    but if you were to combine the above two statements ...

    (i'm trying real hard not to just hand you the answer, although i've pretty much done just that three times now...)

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  15. #15
    SitePoint Evangelist
    Join Date
    Jun 2010
    Posts
    455
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    DROP UNIQUE INDEX (pic) ON stk2
    --> #1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'UNIQUE INDEX (pic) ON stk2' at line 1

    ALTER TABLE stk2 ADD UNIQUE INDEX (pic)
    --> #1062 - Duplicate entry 'LAMS-RM1507.JPG' for key 'pic'

    ALTER IGNORE TABLE stk2 UNIQUE INDEX (pic) --> Didn't work for sure

    I'll bet I'm close, but I don't see it.

  16. #16
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,341
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by nichemtktg View Post
    ALTER IGNORE TABLE stk2 UNIQUE INDEX (pic) --> Didn't work for sure
    i'm sorry, i'm not familiar with the "Didn't work for sure" error message

    have you looked up the syntax in the manual?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  17. #17
    SitePoint Evangelist
    Join Date
    Jun 2010
    Posts
    455
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Yes I have.

    SQL CREATE INDEX Statement

    says: CREATE UNIQUE INDEX index_name ON table_name (column_name)

    is the syntax.

    Except, the error says there's a duplicate entry for key pic suggesting that's the problem, but you know that, right?

    And, you're saying it should still work, right?

  18. #18
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,341
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    okay, i've decided i will give you the answer

    but first...

    could you please do a SHOW CREATE TABLE for your table

    i want to make sure i clean it up properly before adding another index
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  19. #19
    SitePoint Evangelist
    Join Date
    Jun 2010
    Posts
    455
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    CREATE TABLE `stk2` (
    `id` int(11) NOT NULL AUTO_INCREMENT,
    `timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    `cli` varchar(10) NOT NULL,
    `mfr` text NOT NULL,
    `stknum` text NOT NULL,
    `pic` varchar(200) NOT NULL,
    `width` varchar(5) NOT NULL,
    `margin` text NOT NULL,
    `msrp` decimal(11,2) NOT NULL,
    `name` text NOT NULL,
    `description` text NOT NULL,
    `headline` text NOT NULL,
    `ben1` text NOT NULL,
    `ben2` text NOT NULL,
    `ben3` text NOT NULL,
    PRIMARY KEY (`id`),
    KEY `pic_2` (`pic`),
    KEY `pic_3` (`pic`),
    KEY `pic_4` (`pic`)
    ) ENGINE=MyISAM AUTO_INCREMENT=231 DEFAULT CHARSET=latin1

  20. #20
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,341
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by nichemtktg View Post
    KEY `pic_2` (`pic`),
    KEY `pic_3` (`pic`),
    KEY `pic_4` (`pic`)
    these are redundant and unnecessary

    run this --
    Code:
    ALTER TABLE stk2 DROP INDEX pic_2;
    ALTER TABLE stk2 DROP INDEX pic_3;
    ALTER TABLE stk2 DROP INDEX pic_4;
    you might want to do another SHOW CREATE TABLE for yourself, to ensure those indexes are gone

    then do this --
    Code:
    ALTER IGNORE TABLE stk2 ADD UNIQUE (pic)
    and then check to see if there are any dupes
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  21. #21
    SitePoint Evangelist
    Join Date
    Jun 2010
    Posts
    455
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    As you said viola!

    I re-read all the threads and all I could think of was that I was playing a really bad game of battleship. I was calling shots everywhere except where they counted.

    r937, thanks for sticking with me.

    FYI, I got a Deja Vu towards the end of this topic so I sure this was an intended experience for me .

    Thanks Again,

    Niche

  22. #22
    SitePoint Evangelist
    Join Date
    Jun 2010
    Posts
    455
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    P.S.

    ALTER IGNORE TABLE stk2 ADD UNIQUE (pic)

    created

    UNIQUE KEY `pic` (`pic`)

    is there a script similar to ALTER IGNORE TABLE stk2 ADD UNIQUE (pic) that
    removes dupes using an existing UNIQUE KEY?

    I tried ALTER IGNORE TABLE stk2 UNIQUE (pic) unsuccessfully thinking it might have been able to use the exisiting UNIQUE KEY 'pic'

    or must the target column not be an existing unique key for ALTER IGNORE TABLE to work?

  23. #23
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,341
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by nichemtktg View Post
    P.S.

    ALTER IGNORE TABLE stk2 ADD UNIQUE (pic)

    created

    UNIQUE KEY `pic` (`pic`)
    and now there are no more dupes, right?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  24. #24
    SitePoint Evangelist
    Join Date
    Jun 2010
    Posts
    455
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    You're absolutely right! Thank-you very much.

    r937, thanks for sticking with me.

    FYI, I got a Deja Vu towards the end of this topic so I sure this was an intended experience for me .

    Thanks Again,

    Niche


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •