I have a code to remove the duplicates in one of my table but I cant seem to be able to ORDER the way mysql select the values.
Here is my initial table:
mysql_query("CREATE TABLE new_table as SELECT * FROM test GROUP BY url")
mysql_query("DROP TABLE test")
mysql_query("RENAME TABLE new_table TO test")
I want to remove the duplicated URL and make sure that it keeps only the value with the higher PR. I tried using the ORDER BY attribute but it doesnt seem to work. The first 3 values with a PR of 0 are always selected and placed in the new table…
The good result should be that the 3 values (a,b,c) with a PR of 6 are found in the final table.
I tried this and it doesnt work:
mysql_query("CREATE TABLE new_table as SELECT * FROM test GROUP BY url ORDER BY pr DESC")
I am looking for a fast way to achieve this because my table have over 200k entries.
I am admittedly NOT an expert in MySQL, but I’ve come up with this:
mysql_query("CREATE TABLE order_temp as SELECT * FROM test ORDER BY pr DESC; DROP TABLE IF EXISTS temp_table; CREATE TABLE temp_table as SELECT * FROM order_temp GROUP BY url; DROP TABLE IF EXISTS test; RENAME TABLE temp_table TO test; DROP TABLE IF EXISTS order_temp;")
If anyone has a better way I suggest you use it, but this should work for now.
@Guido, I tried your code and it doesnt seem to work.
mysql_query("CREATE TABLE new_table as SELECT url, MAX(pr) AS pr FROM test GROUP BY url");
mysql_query("DROP TABLE test") ;
mysql_query("RENAME TABLE new_table TO test");
mysql_query("CREATE TABLE new_table as SELECT t1.* FROM old_table t1 JOIN (SELECT url,MAX(pr) max_pr FROM old_table GROUP BY url) t2 ON t2.url = t1.url AND t2.max_pr = t1.pr") or die(mysql_error());
This query works well on SMALL table but if I try it with a much bigger table (200k entries), the script freeze and stop working…