Display rows with duplicate fields

Hi there,

I am creating a database of quiz questions. There is an issue that duplicates may appear, since a question may be phrased in more than one way. For example:

Q: Which country was previously known as Rhodesia?
A: Zimbabwe

is a duplicate of

Q: Which African country was once called Rhodesia?
A: Zimbabwe

However, these two questions are obviously not duplicates:
Q: Which country was previously known as Rhodesia?
A: Zimbabwe

Q: Which country, after became a test cricket nation in 1992, had its test cricket status suspended in 2006?
A: Zimbabwe

So what I need to do is select all duplicate answers plus their questions, so that I can determine if any are actual duplicates.

Most online tutorials suggest something like this:


SELECT question, answer FROM questions GROUP BY answer HAVING COUNT(answer) > 1;

But that just gives me the first instance of the duplicate, whereas I need to display BOTH instances.

Thanks in advance,
Mike

unfortunately, using a query to find dupes will find only those that are identical in every single character position

your first example would fail

I’m just looking for dupes in the answer field tho.

If I can display both rows, then I can discern if the question is a duplicate. If I can only see the first row in the group, then I can’t.

Using the query in my previous post, I would get the following result:
Q: Which country was previously known as Rhodesia?
A: Zimbabwe

even though, there are 3 rows that contain Zimbabwe in the answer field. Obviously I need to delete one of the entries regarding Rhodesia, but the one about test cricket can stay.

So I want to return all entries that contain two or more identical answer fields. I’m just not sure what that query would be.

darn, i’m sorry, that’s so obvious, i don’t know why i didn’t understand it right off…

SELECT t.question
     , t.answer 
  FROM ( SELECT answer
           FROM questions 
         GROUP  
             BY answer 
         HAVING COUNT(*) > 1 ) AS d
INNER
  JOIN questions AS t
    ON t.answer = d.answer
ORDER
    BY t.answer

:slight_smile:

Cool, thanks r937. That’s exactly what I’m after.

:slight_smile:

It’s all a little over my head tho. Any chance you could comment the query, or perhaps give some more descriptive aliases?

Really appreciate your help,
Mike

the subquery, which has a table alias of d, should look familiar – it’s the same as the query you were given elsewhere, and it simply retrieves all the answers that are dupes

this subquery, when used in the FROM clause like that, is called a derived table, because it’s actually a table – all queries produce tables – and as a table, you can join it to other tables

so we join it back to the original table, using the answer column as the join column

now, the answer values in the derived table are unique, but each of them will join to two or more rows in the original table, right?

so that’s how you return entire duplicate rows

:slight_smile:

Ok, I see, so d stands for derived table, and t stands for original table.

So you join the derived table (the table of grouped rows) back onto the original table with an inner join on the answers field. Because its an inner join (rather than a left join for example), only the joined results show, which are the dupes.

Nice.

Thanks for taking the time to help me out.

:slight_smile: