SitePoint Sponsor

User Tag List

Results 1 to 6 of 6
  1. #1
    SitePoint Zealot
    Join Date
    Jan 2006
    Posts
    169
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    How to delete duplicate rows?

    I've got a table that may contain rows with duplicates value in one of the columns. If duplicates exist in that column, I want to delete the duplicates, leaving only one. Is there a one statement sql query for doing that? Note that the timestamp and id will be different between duplicates. Only the order_number will be a duplicate.

    table:
    email_queue

    fields:
    id (key)
    order_number (may be duplicate)
    timestamp

  2. #2
    SitePoint Guru risoknop's Avatar
    Join Date
    Feb 2008
    Location
    end($world)
    Posts
    834
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    A single query won't be enough here. You will have to write a procedure for this:

    Code:
    DECLARE
        
        CURSOR c1 IS
            SELECT order_number, COUNT(order_number) count 
            FROM table WHERE count > 1 GROUP BY order_number;
        ordnu VARCHAR(255);
        cnt INT;
        
    BEGIN
        
        OPEN c1;
        LOOP
            FETCH c1 INTO ordnu, cnt;
            EXIT WHEN c1%NOTFOUND;
            DELETE FROM table WHERE order_number = ordnu;
            COMMIT;
        END LOOP;
        CLOSE c1;
        
    END;
    /
    EDIT: This is Oracle syntax. But it should be easy to rewrite it into MySQL.

  3. #3
    SitePoint Guru risoknop's Avatar
    Join Date
    Feb 2008
    Location
    end($world)
    Posts
    834
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Or just use this SELECT query and store the returned result set:

    Code:
            SELECT order_number, COUNT(order_number) count 
            FROM table WHERE count > 1 GROUP BY order_number;
    Then run the DELETE query in a for loop in some server side language (or construct one big DELETE query with IN clause).

  4. #4
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,016
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    what's up with all the looping?

    you can do the whole shebang in one query --
    Code:
    delete email_queue
      from email_queue
    inner
      join ( select order_number 
                  , MAX(id) AS last_id
               from email_queue
             group
                 By order_number
             having count(*) > 1 ) as dupes
        ON dupes.order_number = email_queue.order_number
       and dupes.last_id > email_queue.id
    this deletes all rows where the id value isn't the last id value in each order_number

    you could also do it by the timestamp
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  5. #5
    SitePoint Zealot
    Join Date
    Jan 2006
    Posts
    169
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks, Rudy! BTW, I just got your book in the mail. It's awesome and has already done wonders for SQL even though I'm only halfway through it.

  6. #6
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,016
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    thanks for the kind words
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •