SitePoint Sponsor

User Tag List

Results 1 to 13 of 13
  1. #1
    SitePoint Enthusiast
    Join Date
    Aug 2008
    Posts
    96
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    How to Delete All Duplicated Records

    I have a Wordpress site where I've added functionality to show the location of users who've posted to the site.

    I'm using Google's geomap API to generate lat & lng coordinates. But I must've coded something wrong, because I'm getting 2 or 3 sets of coordinates for every location, and a ton of duplicates for each.

    I first need to clean the duplicates out the database, but I'm unsure how to go about this. What complicates things additionally is how this particular Wordpress table (wp_postmeta) is built.

    The table columns are meta_id, post_id, meta_key and meta_value.

    meta_id is the pk, post_id is the fk back to the actual post, meta_key holds my values that define my meta_values data like "latitude" and "longitude", and meta_value holds the actual data like 33.6839473 .

    Combining the post_id and meta_key values will provide a unique identifier. Any records with duplicate values should be removed.

    If anyone could provide me some ideas or related examples it would be greatly appreciated. Thanks!

  2. #2
    SitePoint Member binhosting's Avatar
    Join Date
    Nov 2011
    Location
    Spring, Texas
    Posts
    11
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Using PHPMyAdmin, this query will provide you with a list of duplicates.

    select meta_id, post_id, meta_key, meta_value, count(*) from wp_postmeta
    group by meta_id, post_id, meta_key, meta_value having count(*) > 1;

  3. #3
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,267
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    dear binhosting, nice try -- and welcome to sitepoint

    since meta_id is the pk of that table, the COUNT(*) is never going to be anything other than 1, so that query will return bupkis

    the list of duplicates is provided by this query --
    Code:
    SELECT t.meta_id
         , t.post_id
         , t.meta_key 
         , t.meta_value
      FROM ( SELECT post_id  
                  , meta_key
               FROM wp_postmeta
             GROUP
                 BY post_id  
                  , meta_key
             HAVING COUNT(*) > 1 ) AS dupes
    INNER
      JOIN wp_postmeta AS t
        ON t.post_id  = dupes.post_id     
       AND t.meta_key = dupes.meta_key
    compare the subquery called "dupes" with this statement by the original poster...
    Combining the post_id and meta_key values will provide a unique identifier. Any records with duplicate values should be removed.
    however, i don't think it should be "any" rows with duplicate values that should be removed... rather, "all but one"

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  4. #4
    SitePoint Enthusiast
    Join Date
    Aug 2008
    Posts
    96
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Yep, that's a lot of duplicates.

    Thank you both for your answers. r937, you've came through for me again. That's a very interesting query. Creating a temp table to call a join on is something I haven't seen before.

    I can generate now a list of meta id's for my dupes. I've tried altering the SQL to generate only the id's then wrap it in its own subquery and delete everything within the list it generates.

    Code:
    DELETE wp_postmeta FROM wp_postmeta where meta_id in(
      SELECT t.meta_id     
      FROM ( SELECT post_id  
                  , meta_key
               FROM wp_postmeta
             GROUP
                 BY post_id  
                  , meta_key
             HAVING COUNT(*) > 1 ) AS dupes
    INNER
      JOIN wp_postmeta AS t
        ON t.post_id  = dupes.post_id     
       AND t.meta_key = dupes.meta_key
    )
    I get an alert saying I can't specify the target table for update in a where clause. I've found this is because MySQL won't update the same table that's been used in a the SELECT part of the query.

    Any suggestions on how to work around this?

  5. #5
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,267
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    before we go any further, could you please provide guidance regarding whether "all" rows that have duplicates should be deleted, or "all but one"
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  6. #6
    SitePoint Enthusiast
    Join Date
    Aug 2008
    Posts
    96
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Sorry, the query should delete all but 1 unique entry for each post_id / meta_key pair.

  7. #7
    SitePoint Wizard guelphdad's Avatar
    Join Date
    Oct 2003
    Location
    St. Catharines, ON Canada
    Posts
    1,708
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Here is a simple way if you don't care which pair is kept:

    Code:
    ALTER IGNORE TABLE yourtablenamegoeshere ADD UNIQUE(post_id,meta_key)

  8. #8
    SitePoint Enthusiast
    Join Date
    Aug 2008
    Posts
    96
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks guelphdad. I'm learning a lot in this thread. Your suggestion is tempting. Only drawback is I have multiples of correct and incorrect coordinates for each post. Some posts would be left with correct coordinates, while others would have the correct data deleted while retaining the bad data. In either scenario I'd be left with only a single pair of coordinates, which would make it easier to administrate. I'm looking over my DB right now to see if this would be a good option.

  9. #9
    SitePoint Wizard guelphdad's Avatar
    Join Date
    Oct 2003
    Location
    St. Catharines, ON Canada
    Posts
    1,708
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    how do you mean correct and incorrect co-ordinates? and no that wouldn't look after whether the co-ordinates are correct or not, but you didn't specify that elsewhere in the thread.

    while I go back and re-read your original post it sounds like you are using ENTITY ATTRIBUTE VALUE table structure which isn't a good thing, perhaps you can give us some sample rows from your table(s) so we can assist you further, particularly to steer you AWAY from EAV if you are indeed using it since it makes many queries so much more cumbersome than properly normalized data strucutures.

  10. #10
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,267
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    guelphdad, you're not going to convince anybody that EAV is bad design when it's built right in to wordpress

    a big outfit like that, they surely must have known what they were doing, right?

    anyhow, back to the dupes ...

    the combination of post_id and meta_key will have dupes, but the meta_value values will differ, so in order to eliminate the "incorrect" dupes, it is necessary to inspect all the dupes individually and make a human decision

    therefore, the list of duplicates query in post #3 is probably as far as you can go, and deleting the incorrect dupes will have to be done by specifying the meta_id values to be deleted
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  11. #11
    SitePoint Wizard guelphdad's Avatar
    Join Date
    Oct 2003
    Location
    St. Catharines, ON Canada
    Posts
    1,708
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    <off topic> Well the Maple Leafs must know what they're doing too since they have all this money .... yet the continue to employ Ron Wilson ..... </hockey rant>

  12. #12
    SitePoint Enthusiast
    Join Date
    Aug 2008
    Posts
    96
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I mentioned in my original thread I had an error in my code that was generating 2-3 sets of coordinates and duplicates of each for every location. That's what I meant about incorrect data. After considering both options I think it's best to delete all the coordinates data, recode the google api and start over.

    Thank you for all the comments in this thread. I've learned a lot. The table join on a subquery example was great, and the ALTER IGNORE TABLE command is also new to me.

    I'd also been wondering what you would call the table structure WP is using for this data. I know now it's referred to as an Entity Attribute Value structure. It'll be easier to research problems now that I know its name.

    I agree with your dislikes to the EAV structure. It makes it hard to call all the related data you need, and I'd imagine this table will quickly get bloated. I think they've designed it this way to make it more flexible for WP so it can be used for different types of sites, while disregarding to the ease of development. I'm ok with the extra work, but I'd imagine this would also hurt site performance in the long run as the database gets populated and this EAV table records multiple rows for every single record in the posts table.

    I know it's off topic from the original question, but even with efficient SQL, would a heavily populated EAV table perform slower than a normalized table?

  13. #13
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,267
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by maestro360 View Post
    ... but even with efficient SQL, would a heavily populated EAV table perform slower than a normalized table?
    not necessarily

    just likely

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •