Remove duplicate rows

Hey there,

I have a code to remove the duplicates in one of my table but I cant seem to be able to ORDER the way mysql select the values.

Here is my initial table:

mysql_query("CREATE TABLE new_table as SELECT * FROM test GROUP BY url") 
mysql_query("DROP TABLE test") 
mysql_query("RENAME TABLE new_table TO test") 

I want to remove the duplicated URL and make sure that it keeps only the value with the higher PR. I tried using the ORDER BY attribute but it doesnt seem to work. The first 3 values with a PR of 0 are always selected and placed in the new table…

The good result should be that the 3 values (a,b,c) with a PR of 6 are found in the final table.

I tried this and it doesnt work:

mysql_query("CREATE TABLE new_table as SELECT * FROM test GROUP BY url ORDER BY pr DESC") 

I am looking for a fast way to achieve this because my table have over 200k entries.

Thanks!

I tried guido’s code and it works, try doing it like:

  1. Create a new table for the “clean” data to go
  2. Use Guido’s query to select the required rows
  3. Read the results from step 2 into an array
  4. Insert the “clean” data into the new table by looping through the array 1 row at a time
  5. Once all clean urls have been inserted into the new table, rename the old table (don’t delete it until your sure everything has gone ok)
  6. Rename the new table to match what the old table was called

I am admittedly NOT an expert in MySQL, but I’ve come up with this:

mysql_query("CREATE TABLE order_temp as SELECT * FROM test ORDER BY pr DESC; DROP TABLE IF EXISTS temp_table; CREATE TABLE temp_table as SELECT * FROM order_temp GROUP BY url; DROP TABLE IF EXISTS test; RENAME TABLE temp_table TO test; DROP TABLE IF EXISTS order_temp;")

If anyone has a better way I suggest you use it, but this should work for now.

@Guido, I tried your code and it doesnt seem to work.

	mysql_query("CREATE TABLE new_table as SELECT url, MAX(pr) AS pr FROM test GROUP BY url");
	mysql_query("DROP TABLE test") ;
	mysql_query("RENAME TABLE new_table TO test");

Thanks to both of you guys!

Here is what I am using at the moment:

mysql_query("CREATE TABLE new_table as SELECT t1.* FROM old_table t1 JOIN (SELECT url,MAX(pr) max_pr FROM old_table GROUP BY url) t2 ON t2.url = t1.url AND t2.max_pr = t1.pr") or die(mysql_error());

This query works well on SMALL table but if I try it with a much bigger table (200k entries), the script freeze and stop working…


SELECT 
    url
  , MAX(pr) AS pr
FROM test 
GROUP BY url