MySQL DELETE query

Boreas · September 12, 2014, 12:09pm

Hi all,

Would appreciate some help if anyone could advise:

I have a mySQL table that records ‘log’ entries for a set of products.

I’m trying to create a query that will retain up to 20 of the most recent entries per unique product.

I have tried a few approaches but none so far seem to be the optimum.

The table is updated all the time, so there are a lot of records in there at the moment. There are approx 10,000 unique products, and lets say there are currently 100 entries each. Daily I wish to clean this up and only keep the most recent 20 records for each unique product. There is a datetime field that can be used to assess a record’s age.

If anyone can assist it would be appreciated.

Thank you!

datamodeler · September 13, 2014, 3:30am

structure of table would be very helpful

r937 · September 13, 2014, 7:04pm

actually knowing only the PK and the datetime column, all the others can be foo, bar, etc.

Boreas · September 15, 2014, 8:22am

Hi, thank you for the replies, please see below:

+------------------+---------------+------+-----+---------+----------------+
| Field            | Type          | Null | Key | Default | Extra          |
+------------------+---------------+------+-----+---------+----------------+
| id               | int(11)       | NO   | PRI | NULL    | auto_increment |
| productsku       | varchar(15)   | NO   | MUL | NULL    |                |
| time             | datetime      | NO   |     | NULL    |                |
| entry            | varchar(255)  | NO   |     | NULL    |                |
+------------------+---------------+------+-----+---------+----------------+

Thanks

donboe · September 16, 2014, 1:32pm

what about:

SELECT DISTINCT(id)
           , productsku
           , entry
   FROM tablename
  ORDER
     BY time DESC LIMIT 20

At least if I understand you right?

Boreas · September 16, 2014, 2:13pm

Thanks donboe, yes that would select what I want to keep, but I need to delete the rows I don’t want to keep. So for clarification i’d like to delete everything except the result of the SELECT you gave - hope that makes sense?

r937 · September 16, 2014, 3:41pm

donboe, DISTINCT is ~not~ a function, you cannot apply it to a single column like that, it applies to all columns in the SELECT list

r937 · September 16, 2014, 3:44pm

[quote=“Boreas, post:1, topic:97427, full:true”]I’m trying to create a query that will retain up to 20 of the most recent entries per unique product.[/quote]let’s be clear… do you want a SELECT query to return the 20 most recent entries per product, or do you want a DELETE statement that will delete all the other ones?

deleting old/actual/historic data is seldom a good idea…

Boreas · September 16, 2014, 3:47pm

Thanks for the reply. I’d like a DELETE to delete all but the most recent 20 entries for each unique product. Don’t need to return them. If not a good idea to delete what would be the alternative? The table gets an immense amount of entries per day, and the last 20 are the only relevant ones.

As i’ve been unable to figure this out the table currently has over 2.5 million rows. Thank you for helping

r937 · September 16, 2014, 5:29pm

are you really sure you want to delete all the older ones?

that you’ll never need to go back and check something happened in the past on one of the products?

in any case, what you’ll end up doing is writing a SELECT query to pull the most recent 20 for each product

you can use this query to create a new table, then you would remove all the rows from the original table, then copy back the ones you’ve saved

it’s not going to be pretty

how often do you think you’ll want to do this?

Lemon_Juice · September 16, 2014, 6:51pm

This should work:

DELETE FROM tablename
    WHERE time < (SELECT MIN(time) FROM
        (SELECT time FROM tablename ORDER BY time DESC LIMIT 20) dt);

However, if you have millions of rows to delete I’m not sure how efficient this would be since mysql is known for poorly optimized subqueries in the WHERE clause. If you find this performs poorly then splitting the process into separate queries might help:

SET @pasttime=(SELECT MIN(time) FROM
    (SELECT time FROM tablename ORDER BY time DESC LIMIT 20) dt);
DELETE FROM tablename WHERE time < @pasttime;

In any case, it would be good to have the time column indexed.

Edit: sorry, I missed the requirement for 20 unique products so my solution is incomplete…

Boreas · September 16, 2014, 8:21pm

Thanks for the replies

Yes its something that is changing and moving so often that only last few events are relevant.

The first time is to ‘clean up’ as the system has been adding records for so long with this issue unresolved hence why there are so many records in there at the moment. I was thinking to run a cron daily to keep it prim, however if the solution you are suggesting is the only one then it sounds like I need to do that first to clean up, then change the funcs that actually do the INSERT of records to run a clean up on just the product it is writing about at the time?

Thanks for the effort Lemon_Juice and the advice on the datetime col index

r937 · September 16, 2014, 9:17pm

latest 20 rows per product –

SELECT id , product_sku , `time` , entry FROM daTable AS t WHERE ( SELECT COUNT(*) FROM daTable WHERE product_sku = t.product_sku AND `time` > t.`time` ) < 20
create a second table, exact same layout, and stick INSERT INTO in front of this SELECT to save the ones to keep

then delete from your main table, and copy these back

reaason for doing it in two steps: mysql cannot DELETE from the same table used in a subselect

Boreas · September 16, 2014, 9:54pm

Thank you r937 for all your help. Will post again with time it took to do this first pass over, for those that may be interested.

Lemon_Juice · September 18, 2014, 12:12am

Are you sure about that? Look at the first query in my post above - it is not doing exactly what the OP wants but it uses a subselect that uses the same table as the DELETE statement. I’m not sure about performance but it works.

vion9929 · September 18, 2014, 9:56am

You have:

a log table
updated all the time
with a lot of records

You want to:

clean up
keep the most recent records

It will take long to delete when a lot of records, especially if updated all the time, so you should not do this on a daily basis.

You could partition the table based on time series, you could create a view for the latest 20 or so most recent entries and you should clean up the log only when tablespace grows too large.

Boreas · September 21, 2014, 8:46pm

Thanks for replies

Yes that sums it up vion9929 , thanks for input. Interesting comment about partitioning, I hadn’t considered that. Will do an initial clean up (I’m going to do this tonight hopefully) and then decide on how best to proceed from there

Boreas · September 21, 2014, 9:06pm

With INSERT INTO table2 in front now been running over 15 mins, all crons are currently paused so there will be no updates to original table whilst this is running. Nothing in “table2” yet. There are about 3M records and about 10,000 unique product_sku’s in “table1”

SpacePhoenix · September 22, 2014, 2:35am

Fire up an app like phpMyAdmin and have a look at table2 to confirm if anything has been written to that table

Boreas · September 22, 2014, 10:07am

With phpMyAdmin open it was interesting to see that on the Structure view the cardinality was increasing (very slowly). A COUNT(*) on table2 returned zero. I imagine the data was still in memory and not yet written to table2. After the query running for 2 hours, there was only 10,000 rows “in” table2. Given the fact there are 10,000 products, and I was after the most recent 20 log entries, I expected to get 200,000 rows when complete.

As this was taking so long and making little progress I killed the query and switched crons back on.

After careful consideration (and knowing I have a full back up should it be required) I truncated table1 of it’s 3M rows.

So the moral here for me at least is don’t get into that situation again. I will now either implement a partition as suggested by vion or have a “clean up before write to log” to keep it from going over 20 entries per product at the time of writing entries. Not sure how thats going to sit performance wise but I don’t seem to have many options,

Thanks all for the input

Topic		Replies	Views
Query advice: deleting everything except for N most recent records Databases	1	302	August 29, 2010
Deleting rows Databases	7	682	October 8, 2014
Limiting rows in MySQL table to last 100 entries from form submission PHP	9	1043	November 26, 2016
Find Last Posted Message associated with each category Databases	2	294	February 12, 2010
Delete query issue Databases	8	1489	October 3, 2015

MySQL DELETE query

Related topics