Display fields depending on whether one field is a duplicate of another

As many others do, I’ve Googled a lot & failed to find a passable answer. So I’m throwing myself at the mercy of the sitepoint.com gods, once again.

Theoretically:

I have a table with 3 fields: id (primary/unique/auto increment), FirstName, LastName. I want to display, on a standard .php page, all records in the FirstName & LastName columns ONLY when the last name is unique. Let me show you an example:


| id | FirstName | LastName |
| 1 | Joe | Smith |
| 2 | Bob | Jones |
| 3 | Sam | Ross |
| 4 | Gene | Smith |
| 5 | Matt | Jones |

Crudely designed table aside, I think you get the idea.

Now: If a duplicate is found ANYWHERE in LastName, the first & last name for the first record with the unique LastName would appear on screen. All records with the duplicate LastName entries would be ignored. All other records that aren’t affected ie have no duplicates would display as normal. In the above table, the names Joe Smith, Bob Jones, and Sam Ross would print (echo) to the screen. Gene Smith would be ignored since Smith had been used earlier, by Joe. The same, Matt Jones would be ignored since there is a Bob Jones. Sam Ross would print since Ross is only used one time, thus he fits the criteria.

With this piece of PHP code, the ultimate goal would be to make a list of people, with each surname being used only once.

Anyone know how? I found a lotta things that’d work with one non-id field but nothing that would work as I need.


select FirstName,
       LastName
  from t
  join (select min(id) as id
          from t
         group
            by LastName) dt
    on t.id = dt.id

I’m trying to follow this but I’m a tad off, as I’m not extremely good at PHP or MySQL. How should I implement this? I tried the ollowing; it doesn’t work:


$query="select FirstName,
       LastName
  from t
  join (select min(id) as id
          from t
         group
            by LastName) dt
    on t.id = dt.id";
$result=mysql_query($query);

$num=mysql_numrows($result);
?>


<?php
$i=0;
while ($i < $num) {

$first=mysql_result($result,$i,"FirstName");
$last=mysql_result($result,$i,"LastName");


?>


<?php echo $first; ?> <?php echo $last; ?>



<?php
$i++;
}
?>

sorry, we are not familiar with the “it doesn’t work” error message

can you be a bit more specific?

Ah yeah. I should have been more specific. When running the above, I get this error:

Warning: mysql_numrows(): supplied argument is not a valid MySQL result resource in /home/username/public_html/display.php on line 18

The code in the page is simply the code I posted above, along with the connection.

okay, that’s a generic php error message which is telling you that your query didn’t work

to find out why your query didn’t work, run it outside of php, i.e. directly in mysql

I ran it through phpMyadmin. I found the error but I’m not certain of how to correct it. I changed a few things (I should have earlier but I wasn’t thinking lol). Here’s what I have:


select FirstName,
       LastName
  from dupes
  join (select min(id) as id
          from dupes
         group
            by LastName) dt
    on t.id = dt.id

The error I receive:

Unknown column ‘t.id’ in ‘on clause’

dupes is the name of the table I’m testing with. It has over 1,000 names.

The error is simple enough for a guy like me to figure out. A solution is a little different: What are the values of t.id and dt.id? As well, would the full script I posted above display the results on a standard .php page?

select FirstName,
       LastName
  from dupes [B][COLOR="#FF0000"]t[/COLOR][/B]
  join (select min(id) as id
          from dupes
         group
            by LastName) dt
    on t.id = dt.id

Hi r937 I’m trying to use the examples provided here to query my DB for duplicates. Basically, the DB has a table called ‘Website’ and inside this table are columns for ‘WebsiteTitle’ and ‘WebsiteName’. I want to query ‘WebsiteTitle’ so I can remove duplicates. I’ve changed the above code to this but it didn’t work


SELECT WebsiteTitle,
       WebsiteName
FROM Website
JOIN (select min(id) as id
FROM Website
GROUP
BY WebsiteTitle) dt
    on t.id = dt.id

could you explain this a bit more please, what you’d like to achieve

you want only one occurrence of each value of WebsiteTitle to remain in the table?

Yes, I only want one occurance of each value

does it matter which one?

I would like to keep the older one and remove the newly added duplicates.

if you don’t have other database features defined such as foreign keys or triggers (which are cumbersome to remove and re-apply), then the easiest method is to create a second table with the rows you want, drop the original, and rename the second

CREATE TABLE WebsiteTemp
( WebsiteTitle VARCHAR(9) NOT NULL -- change as required
, WebsiteName VARCHAR(37) NOT NULL -- change as required
, id INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT )
SELECT t.WebsiteTitle
     , t.WebsiteName
     , t.id
  FROM Website AS t
INNER
  JOIN ( SELECT WebsiteTitle
              , MIN(id) AS min_id
           FROM Website
         GROUP
             BY WebsiteTitle ) AS x
    ON x.min_id = t.id;

DROP TABLE Website;

RENAME TABLE WebsiteTemp To Website;

I got this error message when I ran the query

that’s weird… doesn’t your table have an id column?

It does ID = WebsiteID

so it’s actually called websiteid and not id?

well, that’s the cause of your error, then :slight_smile:

also, presumably you don’t really want 9 and 37 as the column sizes for the other two columns, right?

finally, are there any other columns in the table? because the CREATE TABLE statement for the websitetemp table has to define them all

There are actually several columns within the ‘Website’ table. They are

WebsiteID
WebsiteTitle
WebsiteName
WebsiteURL
Description
AfURL
SEOTitle
SEOKeyword
IsActive
IsFeatured
SearchKeywords
DateAdded
Views

so you know how to change the query, right?