Removing duplicate using PHP/MY SQL?

Hey guys, I have database of job descriptions. Some people submit the same job twice and I need to delete the duplicates. I’ve tried to modify the below code to suit my need but I’m getting an error

also not sure why $link is there for, I’ve tried it with and without. where I run the script here is an example output

Warning: mysql_query(): supplied argument is not a valid MySQL-Link resource in /home/netboomy/public_html/ACCOUNTS-ASSISTANT-JOBS.CO.UK/rdup.php on line 18
deleted Accounts Assistant

The actual job does not get deleted though… any help is welcome :slight_smile:


<?php
include("config.php");

    $query="SELECT * FROM jobs";

    $result=mysql_query($query);
    echo mysql_error();


        while($row = mysql_fetch_array($result, MYSQL_BOTH))

        {
                $query1="SELECT * FROM jobs WHERE desc = '".$row[2]."'";

                $result1=mysql_query($query1);
                $count = mysql_num_rows($result1) - 1;

                mysql_query("DELETE FROM jobs WHERE desc='".$row[2]."' LIMIT $count",$link);
                //mysql_query("DELETE FROM jobs WHERE desc='".$row[2]."' LIMIT $count");
                echo "deleted $row[1] <br>";

}
        echo mysql_error();

?>

I reread your OP, and something I missed the first time (and blue didn’t): the error message you’re getting seems to indicate that you didn’t connect succesfully to the database.
Change the code in the include to display the mysql errors when they occur to have more info about the problem

Hey guys sorry for the late reponse. I’ll be adding duplicate detection on insertion as soon as i do this bit. The include takes care of the database connection and is not an issue.

Its late here, so will try and tackle this again in the morning. Thanks for the help will re-read in the morning and try and out into place what you guys haven mentioned.

SQL result

Host: localhost
Database: newe_jobs
Generation Time: Sep 04, 2010 at 05:19 PM
Generated by: phpMyAdmin 3.2.4 / MySQL 5.1.48
SQL query: DESCRIBE jobs;
Rows: 7
Field Type Null Key Default Extra
id int(11) NO PRI NULL auto_increment
title varchar(100) NO NULL
desc varchar(1500) NO NULL
location varchar(40) NO NULL
type varchar(40) NO NULL
sal varchar(30) NO NULL
date varchar(10) NO NULL

Probably help a lot if the OP would show us the result of sending this sql statement to the database.


DESCRIBE jobs

DESCRIBE

The OP probably takes care of that in the include, or maybe he didn’t post the entire code.

Second problem is that the query issued for deleting is incredibly bad.
You don’t limit your delete to 0 records or 1 record, you delete everything that has the same description as current row EXCEPT the current row.

You’re right that the whole procedure of deleting duplicate job descriptions is structured badly. But the fact that the OP deletes all rows except one isn’t necessarilly a problem, probably the one that remains is the one he needs. If not, then he does have a problem :slight_smile:

can you try


mysql_query("DELETE FROM jobs WHERE desc='".$row[2]."' LIMIT $count") or die(mysql_error());

and remove


echo mysql_error();

at the bottom

First problem you have is that you haven’t connected to MySQL and you’re issuing queries.
Second problem is that the query issued for deleting is incredibly bad.
You don’t limit your delete to 0 records or 1 record, you delete everything that has the same description as current row EXCEPT the current row.
In terms of SQL:

DELETE FROM jobs WHERE desc='some desc' AND id NOT IN(1);

Third problem is that your code will be slow as your database grows, and you have to manually start the script every now and then.

The other problem was partially solved by guido2004 - if you don’t want duplicate entries within database, you use unique key.
You can do the following then: allow the user to insert and then check whether mysql query succeeded. If mysql query failed, you check the error code.
Based on errror code, you determine whether the unique key stopped the insert. If yes, you inform the user about their error.
I won’t type the code as it’s trivial and can easily be done within a few minutes, but the question you should ask yourself is - is your application built good enough?
Is the job description duplicates really something you want to remove?

If you don’t want duplicate job descriptions, why not put a unique key on that field? That way you can avoid inserting them, instead of having to delete them afterwards

if you don’t want duplicate job postings in your system you should double check the original programming that adds the postings for example add some Error Handlings scripts, Double check if there is a duplicate posting before its finalized into the database, most postings should be moderated and approved before adding to the database or making live… If you happen to get a duplicate through the first few phases then atleast it’ll at the very minimum you can just manually delete them or set up a flag in the admin for duplicate.