SitePoint Sponsor

User Tag List

Results 1 to 3 of 3
  1. #1
    SitePoint Evangelist
    Join Date
    Nov 2009
    Posts
    471
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    deleting duplicates with irregular patterns

    I have a company database with name, address and city. I have been able to get rid of duplicates with matches at the beginning of the name with the same address, the same name and address, and the same name with the same beginning of the address.

    But I now have some situations like

    DiMio and Di Mio
    P S Gourmet Coffee and Ps Gourmet Coffee

    Both have the same addresses and city but there are too many cases of names being different with the same address that I can't just use an address match. Some part of the name needs to match also.

    I have tried several things, including

    SELECT id, name, address, city, count( * ) AS n
    FROM t1
    GROUP BY name like SUBSTRING(name FROM 5 FOR 4) , address, left( city, 4 )
    HAVING n >1

    But that didn't really work well at all.

    Is there a way to do this in MySQL or do I need to do it with a php routine of some sort?

  2. #2
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,341
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    try this --
    Code:
    SELECT id
         , name
         , address
         , city
         , COUNT(*) AS n
      FROM t1
    GROUP 
        BY REPLACE(name,' ','')
         , address
         , left(city,4)
    HAVING n > 1
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  3. #3
    SitePoint Evangelist
    Join Date
    Nov 2009
    Posts
    471
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks, that works for those names pretty well. And also gives me one more tool in the toolbox to go after the others.

    Like if one starts with The and the second one doesn't. And I realize at some point it becomes an individual case and decision. So I'll head down that road. But the replace really helps. Thanks for the input.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •