SitePoint Sponsor

User Tag List

Results 1 to 10 of 10
  1. #1
    SitePoint Guru
    Join Date
    Nov 2004
    Location
    England
    Posts
    698
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)

    Question Find and Delete Orphaned Entries

    Hi guys,

    I have two tables that I'm working on here; one has job information and the other has customer information. There is a common cust_id between the two. I've found that there are 95,326 customer records and 94,526 job records, so we obviously have a problem (all current data is one customer record per job and vice versa: don't ask, it wasn't this way before but changes were made that resulted in this situation)

    A quick check has found a couple of duplicates, but that's not entirely a problem because when that job is opened, the system checks for duplicates as it goes and will remove them. An archive operation also does the same thing, so the duplicates are a fairly minor problem for the time being.

    What is a problem however is that I found a very old customer record which is duplicated many times. When I tried to open the job it was no longer in the live table, so no duplicates can be removed, so it is essentially orphaned from the parent job.

    What I want to do now is to go through the customer table and check that the associated job exists, and if not to delete the current customer record, but I don't know how to do that in MySQL itself. I could write a PHP script that would do the job but obviously it would be much slower than getting MySQL to do it on its own. How would you guys recommend that I go about doing this?

    Thanks

  2. #2
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,247
    Mentioned
    59 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by Antnee View Post
    What I want to do now is to go through the customer table and check that the associated job exists, and if not to delete the current customer record...
    DELETE FROM customers WHERE id NOT IN ( SELECT cust_id FROM jobs )

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  3. #3
    SitePoint Guru
    Join Date
    Nov 2004
    Location
    England
    Posts
    698
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Thanks. Running the query on a test machine and it's still running after +200secs... will see how long it takes and see whether I'm going to need to run this overnight or whether I'll be able to do it over a break

  4. #4
    SitePoint Guru
    Join Date
    Nov 2004
    Location
    England
    Posts
    698
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    OK, I aborted that query after over ten minutes had passed. Both IDs are indexed but that query doesn't actually compare the two anyway does it so I assume it doesn't use the indexes?

  5. #5
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,247
    Mentioned
    59 Post(s)
    Tagged
    3 Thread(s)
    hi, sorry you had problems, it does indeed sound like an indexing issue

    try the DELETE using a LEFT OUTER JOIN (exact syntax is in the docs)

    the join can be faster than the IN subquery for large tables
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  6. #6
    SitePoint Guru
    Join Date
    Nov 2004
    Location
    England
    Posts
    698
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    OK, never used joins for deleting, looking at the syntax now. Should I be ignoring valid joins and deleting the rest maybe?

  7. #7
    SitePoint Guru
    Join Date
    Nov 2004
    Location
    England
    Posts
    698
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Just out of curiosity, I tried a select query based on your earlier suggestion, r937, limited to 100 results and it took 22.5 seconds

    When I EXPLAIN the query, I get the following:
    Code:
    +----+--------------------+----------------+-------+---------------+---------+---------+------+-------+--------------------------+
    | id | select_type        | table          | type  | possible_keys | key     | key_len | ref  | rows  | Extra                    |
    +----+--------------------+----------------+-------+---------------+---------+---------+------+-------+--------------------------+
    |  1 | PRIMARY            | customer       | ALL   | NULL          | NULL    | NULL    | NULL | 95421 | Using where              |
    |  2 | DEPENDENT SUBQUERY | jobs           | index | NULL          | indexes | 86      | NULL | 94613 | Using where; Using index |
    +----+--------------------+----------------+-------+---------------+---------+---------+------+-------+--------------------------+
    Both tables have the cust_id as an index

  8. #8
    SitePoint Guru
    Join Date
    Nov 2004
    Location
    England
    Posts
    698
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Just tried this SELECT query:

    Code:
    SELECT *
    FROM customer
    LEFT OUTER JOIN jobs
    ON customer.cust_id = jobs.cust_id
    WHERE jobs.job IS NULL
    LIMIT 0,100
    That took 130 seconds, so I'm guessing that isn't the way to do it either? Not for the number of rows I need to delete anyway. I don't understand why the query takes so long either, especially as the cust_id is indexed in both tables

  9. #9
    reads the ********* Crier silver trophybronze trophy longneck's Avatar
    Join Date
    Feb 2004
    Location
    Tampa, FL (US)
    Posts
    9,854
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    do you have an index on cust_id in both tables?
    Check out our new Industry News forum!
    Keep up-to-date with the latest SP news in the Community Crier

    I edit the SitePoint Podcast

  10. #10
    SitePoint Guru
    Join Date
    Nov 2004
    Location
    England
    Posts
    698
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Yeah. I'm using HeidiSQL and it's showing both as indexes (has a little "i" icon next to each in the table view). I don't know what else it could be. we're using MyISAM as well, could that have anything to do with it?


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •