Select all where two columns are distinct

nmeri17 · September 18, 2017, 7:19am

I’m having trouble figuring out what query would return all rows only when two columns preexist. I can’t set them both to UNIQUE because I’m expecting the operands to appear in other variants but not as a pair. Something like this

id | ppg | ppg1 | href | date
_______________________
2 |John |Doe |sitepoint |17-09-17
________________________________
8 |John |Jack |stackoverflow |17-09-17
________________________________
30 |John |Doe |sitepoint |17-09-17

I want to select rows with id 2 and 8 only. I tried

SELECT t1.* FROM container AS t1 INNER JOIN container AS t2 ON t1.href = t2.href AND t1.ppg > t2.ppg AND t1.ppg1 > t2.ppg WHERE t2.id IS NULL AND t1.date = '17-09-17'

But this just fetches all the rows as is. I was hoping it would select all entries on both tables with matching hrefs and merge them into a new table but preserving those with greater IDs.

I was able to achieve the desired effect on PHP but for the purpose of separation of concerns, I thought it’d be better to let mysql do its job.

droopsnoot · September 18, 2017, 11:19am

I’m struggling to understand what rule you use to decide that you want the rows with id 2 and 8. You say you want rows 2 and 8, but then earlier you said you wanted to return all rows. And I’m a little unsure why this is in the query

AND t1.ppg > t2.ppg AND t1.ppg1 > t2.ppg

Is the second one supposed to be ppg2? And are there two tables here, or just the one? You mention both tables, but then seem to use the same one in the query.

I can’t quite decide whether I’ve stumbled on a question so complex I can’t understand it, or whether there are typos here. I’m learning db stuff, so it might just be me.

r937 · September 18, 2017, 4:03pm

please rephrase this

the only interpretation i can come up with for a column that “preexists” is a column that has already been declared for the table

i’m struggling to figure out whether you might mean nulls

also, you ~can~ set a ~pair~ of columns to be UNIQUE

nmeri17 · September 18, 2017, 4:14pm

By id 2 and 8, I was implying I want all rows ignoring or with the exception of the duplicate, which in this context is the row with id 30. For the purpose of clarity, I specifically mentioned the rows I want. If I had a million rows, I want a query that’d return all rows on a specific date but without duplicates. The data inserted unfortunately cannot be regulated by a unique key except maybe if I were to use their href column.

I’m using this AND t1.ppg > t2.ppg AND t1.ppg1 > t2.ppg based on my limited knowledge of SQL beyond everyday queries. Like I explained earlier, the intended behavior of those commands is:

select everything in the table and store in two variables t1 and t2

merge rows in both 'tables' based on wherever they have the same href (although column "id" would've been more ideal for this, it didn't work)

out of both tables, migrate rows where their primary key is greater than their counterpart in the 2nd table, to the new table

this should cause stranded holes in the new table like int[] s = new int[5]; s = {76,12}; would result in holes from index 2-4, thus I said “WHERE t2.id IS NULL” in order to track those that failed to make the jump.

But I’m open to any other way the task can be accomplished.

nmeri17 · September 18, 2017, 4:25pm

When I say two columns preexist, I depicted that in my example as the data in id 30 and 2 whereby the values in columns row id 30 column ppg and ppg1 preexist. That pair is not distinct. That pair is what I’m interested in its distinctiveness. Individually, they can both appear alongside other “partners” i.e. ppg = Doe and ppg1 = John or ppg = John and ppg1 = anything else besides Doe since that pair already exists. Capish?

r937 · September 18, 2017, 4:34pm

no

if you want to avoid duplicates of { ppg , ppg1 } then the best way to do that is by preventing them from entering the table

this is accomplished by

ALTER TABLE container  ADD UNIQUE ( ppg , ppg1 )

until you do that, you will always have to try to filter out the duplicates

here’s how to run a query that gets only unique stuff –

SELECT MAX(id) AS some_id , ppg , ppg1 , MIN(href) AS some_href , MAX(date) AS some_date FROM container GROUP BY ppg , ppg1
i’ve used the column aliases some_* to highlight that the values of these columns might not all be from the same original row

verstehst?

nmeri17 · September 18, 2017, 5:44pm

*sigh* Well, no offence but I think you might have to take a walk on this one. If you didn’t understand my last explanation then I’m not even intelligent enough to conceive a more lucid explanation of what the problem is.

I can’t also use your solution because the number of columns is large. I need to select * but only when the condition is met. Think of it like SELECT DISTINCT * in a way.

I expressly said in my OP that I can’t add UNIQUE to those columns because they themselves shouldn’t be unique or distinct. What I need to be unique is a pair. MYSQL does not have IF conditions but if it did, my filter/exclusion function would run ONLY when the values in the pair of columns ppg and ppg1 on a row are already a pair in another row on the same table. I illustrated it with the table in my OP. I said since the values in id 30 columns (PPG and PPG1) are already the same values in id 2 columns (PPG and PPG1), forget about column 30. If you don’t understand me past that then thanks for trying to help and have a nice evening.

r937 · September 18, 2017, 5:53pm

dude… please read my last reply again

What I need to be unique is a pair – this is exactly what i gave you

please, do go right ahead and test that, because it won’t do what you think

and as for telling me to take a hike? *plonk*

nmeri17 · September 18, 2017, 6:31pm

Are you for real man? Of course I know that won’t work that’s why I didn’t try it and I’m posting here instead. I’m giving instances to help you understand what I’m trying to explain. smh.

As for your solution, I tried it and returns 509 rows. Now, all entries for that day are 496 in number and were mistakenly entered twice so instead of removing everything from the index of the first batch of duplicate rows, I want a permanent query that would allow me fetch unique rows in case it recurs in the future. So I guess your query should return 248 results (496/2), if it does work as intended and if the duplication happened just once.

By the way

Please refrain from posting to talk.bizarre until such time as you
cease to be an asshole and become at least one of: bizarre, creative,
or entertaining. You are welcome to dump your rotting ordure in rec.humor
or some similar group where your fellow mental defectives congregate.

Did I not ask you not to take offence?

DaveMaxwell · September 18, 2017, 6:37pm

I have tried to follow the discussion with @r937, and I’ve got to admit that I’m lost as well.

I’m guessing (in laymen’s terms) that there are more fields on the container table than you’ve shown thus far, and that you’re looking to pull all of those fields, but ONLY for the first occurrence of a person/url combination for a specific date. IF I’ve guessed right, then a sub-query which uses a GROUP BY would be an appropriate approach

SELECT id
     , ppg1
     , ppg2
     , href
     , 'date'
  FROM container
 WHERE id IN (SELECT MIN(id)
                   , ppg1
                   , ppg2
                   , href
                FROM container
               WHERE date = '17-09-17'
               GROUP BY ppg1
                      , ppg2
                      , href) sq

Though to be honest, a more generic sub-query would probably serve you better in that you can use it to search by date, href, even names…

SELECT id
     , ppg1
     , ppg2
     , href
     , 'date'
  FROM container
 WHERE id IN (SELECT MIN(id)
                   , ppg1
                   , ppg2
                   , href
                   , 'date'
                FROM container
               GROUP BY ppg1
                      , ppg2
                      , href
                      , 'date') sq

nmeri17 · September 19, 2017, 10:44am

Hey. So sorry about the delay in response. My phone fell yesterday evening and its screen shattered to pieces.

I tried your solution but it didn’t work. Idk if the ‘sq’ at the end of your queries is a typo because it results in an error over here. When I then omit the sq and try to run the query, I get another error

#1241 - Operand should contain 1 column(s)

Thank you for trying to be of help.

#EDIT

Yes. There are more than 80 columns per row. And yes, your understanding is correct. The data should only be regarded as duplicate when the pair or combination of those columns occur elsewhere.

DaveMaxwell · September 19, 2017, 12:49pm

Ugh. Sorry about that. Rookie mistake.

This didn’t work because I was an idiot (the sub-query would return many values which throws the IN off). Try this format instead…

SELECT c.id
     , c.ppg1
     , c.ppg2
     , c.href
     , c.'date'
     , c.field1
     , c.field2
     , c.fieldx
  FROM container C
 JOIN (SELECT MIN(id) AS MinID
            , ppg1
            , ppg2
            , href
            , 'date'
         FROM container
        GROUP BY ppg1
               , ppg2
               , href
               , 'date') sq ON sq.MinID = c.id

nmeri17 · September 19, 2017, 8:36pm

Corrected this

to c.date and it still returned 509 results.

You know what guys? I really appreciate the effort. If the nut refuses to crack, let’s let it be. At least I have a solution to tackle it in PHP. If it’s either impossible or ambiguous to achieve in MYSQL, there’s no biggie. Thank you all once again.

#EDIT
Is there a way to delete all the duplicate rows so I can set the href column to unique? I’ve just realized that some rows inserted as many as 4 times and since the id column is auto increment, they slid in undetected. How can I now spot all occurrences of duplicate entries and subsequently delete them?

system · December 20, 2017, 3:41am

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
SQL Select in two tables MySQL Databases	9	1259	September 24, 2014
How do I get a distinct rows based on the value of a given field Databases	11	576	August 26, 2010
Select Distinct only on certain columns? Databases	5	41764	September 24, 2014
Select rows where column equals multiple values Databases	2	1829	July 15, 2010
Select rows from same table which have different values in a field Databases	5	2956	October 8, 2014

Select all where two columns are distinct

Related topics