SitePoint Sponsor

User Tag List

Results 1 to 17 of 17
  1. #1
    SitePoint Guru
    Join Date
    Aug 2004
    Location
    Taunton, UK
    Posts
    787
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Is this code to delete duplicates OK?

    I am using a table I was given of all the towns in the UK and I have realised it contains duplicates

    e.g

    Code:
    971 	Birmingham 	West Midlands
    978 	Birmingham 	West Midlands
    993 	Birmingham 	West Midlands
    1006 	Birmingham 	West Midlands
    1047 	Birmingham 	West Midlands
    My table structure is as follows:

    Code:
    CREATE TABLE `towns` (
      `town_id` int(11) NOT NULL,
      `town` varchar(150) collate utf8_unicode_ci NOT NULL,
      `county` varchar(150) collate utf8_unicode_ci NOT NULL,
      PRIMARY KEY  (`town_id`)
    ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci COMMENT='Towns';
    I have done a search for some code to remove duplicates and found the following code but wanted to get a second opinion before using it as the site is live and I don't want to break it

    Code:
    CREATE TABLE new_table as
    SELECT * FROM old_table WHERE 1 GROUP BY [column to remove duplicates by];
    
    DROP TABLE old_table;
    
    RENAME TABLE new_table TO old_table;
    If this is correct, then can you advise me of what to put in the square bracket?
    If there is a better solution please let me know.

    Many thanks

    Paul
    Mediakitchen Limited
    App Development | Website Design & Development | Flash Game Development
    Somerset, UK
    http://www.mediakitchen.co.uk

  2. #2
    From space with love silver trophy
    SpacePhoenix's Avatar
    Join Date
    May 2007
    Location
    Poole, UK
    Posts
    5,072
    Mentioned
    103 Post(s)
    Tagged
    0 Thread(s)
    If the table includes post codes it might have the town repeated for each post code area which is the same town
    Community Team Advisor
    Forum Guidelines: Posting FAQ Signatures FAQ Self Promotion FAQ
    Help the Mods: What's Fluff? Report Fluff/Spam to a Moderator

  3. #3
    SitePoint Guru
    Join Date
    Aug 2004
    Location
    Taunton, UK
    Posts
    787
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks SpacePhonenix - the table doesn 't contain postcodes - just town and county
    Mediakitchen Limited
    App Development | Website Design & Development | Flash Game Development
    Somerset, UK
    http://www.mediakitchen.co.uk

  4. #4
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,338
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    please run this --
    Code:
    SELECT town
      FROM towns
    GROUP
        BY town
    HAVING COUNT(*) <> COUNT(DISTINCT county)
    any results?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  5. #5
    SitePoint Guru
    Join Date
    Aug 2004
    Location
    Taunton, UK
    Posts
    787
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks r937

    Here is the result I got

    Code:
    Showing rows 0 - 29 (410 total, Query took 0.1852 sec)
    Mediakitchen Limited
    App Development | Website Design & Development | Flash Game Development
    Somerset, UK
    http://www.mediakitchen.co.uk

  6. #6
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,338
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    so what that means is that there are some town names which exist in more than one county

    which makes sense, right? two different counties could have two different towns with the same name, right?

    the "remove duplicates" code which you posted would obliterate this information
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  7. #7
    SitePoint Guru
    Join Date
    Aug 2004
    Location
    Taunton, UK
    Posts
    787
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Eep I am glad I asked.

    I don't suppose you would be so kind as to suggest some code that would do what I need to do?
    Mediakitchen Limited
    App Development | Website Design & Development | Flash Game Development
    Somerset, UK
    http://www.mediakitchen.co.uk

  8. #8
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,338
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    sure

    first, which database system is this? it looks like MySQL but i'm just double-checking because we have a MySQL forum but you chose to post in the Databases forum instead

    second, does this query return anything --
    Code:
    SELECT MIN(town_id)
         , town
         , county
      FROM towns
    GROUP
        BY town
         , county
    HAVING COUNT(*) > 1
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  9. #9
    SitePoint Guru
    Join Date
    Aug 2004
    Location
    Taunton, UK
    Posts
    787
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks Rudy - I appreciate it.

    Yes it is MYSQL - sorry, in my haste to post I didn't notice there were separate forums but hopefully you won't abandon me for my mistake.

    Your query returns

    Code:
    Showing rows 0 - 29 (458 total, Query took 1.0632 sec)
    Many thanks

    Paul
    Mediakitchen Limited
    App Development | Website Design & Development | Flash Game Development
    Somerset, UK
    http://www.mediakitchen.co.uk

  10. #10
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,338
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    just to be safe, back up your table data first, then try this --
    Code:
    DELETE towns
      FROM ( SELECT MIN(town_id) AS min_id
                  , town        
                  , county      
               FROM towns       
             GROUP              
                 BY town        
                  , county      
             HAVING COUNT(*) > 1 ) AS dupes
    INNER
      JOIN towns
        ON towns.town   = dupes.town
       AND towns.county = dupes.county
     WHERE towns.town_id > dupes.min_id
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  11. #11
    SitePoint Guru
    Join Date
    Aug 2004
    Location
    Taunton, UK
    Posts
    787
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks Rudy, back up made but get the following error:

    Code:
    #1104 - The SELECT would examine more than MAX_JOIN_SIZE rows; check your WHERE and use SET SQL_BIG_SELECTS=1 or SET SQL_MAX_JOIN_SIZE=# if the SELECT is okay
    Mediakitchen Limited
    App Development | Website Design & Development | Flash Game Development
    Somerset, UK
    http://www.mediakitchen.co.uk

  12. #12
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,338
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    wha???

    i don't know why you would get that, nor how to fix it, but i can think of a couple of ways to get around it...
    Code:
    CREATE TABLE towns_to_delete
    SELECT towns.town_id
      FROM ( SELECT MIN(town_id) AS min_id
                  , town        
                  , county      
               FROM towns       
             GROUP              
                 BY town        
                  , county      
             HAVING COUNT(*) > 1 ) AS dupes
    INNER
      JOIN towns
        ON towns.town   = dupes.town
       AND towns.county = dupes.county
     WHERE towns.town_id > dupes.min_id
    if this runs, you should inspect a few ids from the towns_to_delete table, and convince yourself that those ids do need to be deleted, then run this --
    Code:
    DELETE FROM towns WHERE town_id IN
    ( SELECT town_id FROM towns_to_delete )
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  13. #13
    SitePoint Guru
    Join Date
    Aug 2004
    Location
    Taunton, UK
    Posts
    787
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks Rudy

    Unfortunately I get the following similar message

    Code:
    #1104 - The SELECT would examine more than MAX_JOIN_SIZE rows; check your WHERE and use SET SQL_BIG_SELECTS=1 or SET SQL_MAX_JOIN_SIZE=# if the SELECT is okay
    Mediakitchen Limited
    App Development | Website Design & Development | Flash Game Development
    Somerset, UK
    http://www.mediakitchen.co.uk

  14. #14
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,338
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    you must be doing something else wrong, because that subquery ran fine back in post #8

    can i see the exact last query you tried?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  15. #15
    SitePoint Guru
    Join Date
    Aug 2004
    Location
    Taunton, UK
    Posts
    787
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Yeah sure - here is the exact query

    Code:
    CREATE TABLE towns_to_delete SELECT towns.town_id
    FROM (
    
    SELECT MIN( town_id ) AS min_id, town, county
    FROM towns
    GROUP
    BY town, county
    HAVING COUNT( * ) >1
    ) AS dupes
    INNER
    JOIN towns ON towns.town = dupes.town
    AND towns.county = dupes.county
    WHERE towns.town_id > dupes.min_id

    I tried the following from post 8 again and it still works. I guess you know it is not exactly the same query

    Code:
    SELECT MIN( town_id ) , town, county
    FROM towns
    GROUP
    BY town, county
    HAVING COUNT( * ) >1
    LIMIT 0 , 30
    Mediakitchen Limited
    App Development | Website Design & Development | Flash Game Development
    Somerset, UK
    http://www.mediakitchen.co.uk

  16. #16
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,338
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    ALTER TABLE towns ADD INDEX ( town,county )

    this should speed things up, although i don't think it really addresses your MAX_JOIN_SIZE problem...
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  17. #17
    SitePoint Guru
    Join Date
    Aug 2004
    Location
    Taunton, UK
    Posts
    787
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hey Rudy

    After running the ALTER TABLE query, I then tried your previous CREATE TABLE towns_to_delete and it now worked.

    The ids in the towns_to_delete where indeed the culprits so I ran the other query to delete them from the towns table and it is all fine now.

    So I would just like to say thank you so much for all your time on this. It is really appreciated as I am trying to run before I can walk with this site.
    Mediakitchen Limited
    App Development | Website Design & Development | Flash Game Development
    Somerset, UK
    http://www.mediakitchen.co.uk


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •