Sorting Duplicates

Hi all, I have created an order system for my work which stores invoices, customers, etc. Often employees don’t have a quick search to see if the customer exists so just add as a new customer - leaving me with duplicates in the database. I manually go through these and weed out any duplicates using the following query:

SELECT c_firstname, c_lastname, COUNT( * ) AS NumOccurrences
FROM customers
GROUP BY c_firstname, c_lastname
HAVING (
NumOccurrences
) >1
ORDER BY COUNT( * ) DESC

Problem is, I’m left with about ten customers who appear to be duplicates but their addresses are different and therefore different customers with the same name. How can I somehow filter these out without comparing the addresses? Any help would be appreciated :slight_smile:

Don’t have? Or don’t do?
If they don’t have the possibility to do a quick search, can’t you give it to them?

so just add as a new customer - leaving me with duplicates in the database. I manually go through these and weed out any duplicates using the following query:

SELECT c_firstname, c_lastname, COUNT( * ) AS NumOccurrences
FROM customers
GROUP BY c_firstname, c_lastname
HAVING (
NumOccurrences
) >1
ORDER BY COUNT( * ) DESC

Problem is, I’m left with about ten customers who appear to be duplicates but their addresses are different and therefore different customers with the same name.

If there can’t be duplicate firstname+lastname in the table, add a unique index on those two columns. Then in your application, all you have to do is check for a duplicate key error.

If there can be duplicates (and I guess this is the case with those ten customers), then you’d have to find another key that identifies them uniquely. For example firstname+lastname+address.

How can I somehow filter these out without comparing the addresses? Any help would be appreciated :slight_smile:

What exactly do you want to do? Exclude these ten from the duplicates without comparing the addresses? I don’t think that’s possible. How would you know the difference between a ‘real’ duplicate and a ‘fake’ one, without using the one piece of data that tells you if it’s real or fake?