Select all where two columns are distinct

I’m having trouble figuring out what query would return all rows only when two columns preexist. I can’t set them both to UNIQUE because I’m expecting the operands to appear in other variants but not as a pair. Something like this

id | ppg | ppg1 | href | date
_______________________
2 |John |Doe |sitepoint |17-09-17
________________________________
8 |John |Jack |stackoverflow |17-09-17
________________________________
30 |John |Doe |sitepoint |17-09-17

I want to select rows with id 2 and 8 only. I tried

SELECT t1.* FROM container AS t1 INNER JOIN container AS t2 ON t1.href = t2.href AND t1.ppg > t2.ppg AND t1.ppg1 > t2.ppg WHERE t2.id IS NULL AND t1.date = '17-09-17'

But this just fetches all the rows as is. I was hoping it would select all entries on both tables with matching hrefs and merge them into a new table but preserving those with greater IDs.

I was able to achieve the desired effect on PHP but for the purpose of separation of concerns, I thought it’d be better to let mysql do its job.

I’m struggling to understand what rule you use to decide that you want the rows with id 2 and 8. You say you want rows 2 and 8, but then earlier you said you wanted to return all rows. And I’m a little unsure why this is in the query

AND t1.ppg > t2.ppg AND t1.ppg1 > t2.ppg 

Is the second one supposed to be ppg2? And are there two tables here, or just the one? You mention both tables, but then seem to use the same one in the query.

I can’t quite decide whether I’ve stumbled on a question so complex I can’t understand it, or whether there are typos here. I’m learning db stuff, so it might just be me.

please rephrase this

the only interpretation i can come up with for a column that “preexists” is a column that has already been declared for the table

i’m struggling to figure out whether you might mean nulls

also, you ~can~ set a ~pair~ of columns to be UNIQUE

By id 2 and 8, I was implying I want all rows ignoring or with the exception of the duplicate, which in this context is the row with id 30. For the purpose of clarity, I specifically mentioned the rows I want. If I had a million rows, I want a query that’d return all rows on a specific date but without duplicates. The data inserted unfortunately cannot be regulated by a unique key except maybe if I were to use their href column.

I’m using this AND t1.ppg > t2.ppg AND t1.ppg1 > t2.ppg based on my limited knowledge of SQL beyond everyday queries. Like I explained earlier, the intended behavior of those commands is:

  • select everything in the table and store in two variables t1 and t2
    
  • merge rows in both 'tables' based on wherever they have the same href (although column "id" would've been more ideal for this, it didn't work)
    
  • out of both tables, migrate rows where their primary key is greater than their counterpart in the 2nd table, to the new table
    
  • this should cause stranded holes in the new table like int[] s = new int[5]; s = {76,12}; would result in holes from index 2-4, thus I said “WHERE t2.id IS NULL” in order to track those that failed to make the jump.

But I’m open to any other way the task can be accomplished.

When I say two columns preexist, I depicted that in my example as the data in id 30 and 2 whereby the values in columns row id 30 column ppg and ppg1 preexist. That pair is not distinct. That pair is what I’m interested in its distinctiveness. Individually, they can both appear alongside other “partners” i.e. ppg = Doe and ppg1 = John or ppg = John and ppg1 = anything else besides Doe since that pair already exists. Capish?

no

if you want to avoid duplicates of { ppg , ppg1 } then the best way to do that is by preventing them from entering the table

this is accomplished by

ALTER TABLE container  ADD UNIQUE ( ppg , ppg1 )

until you do that, you will always have to try to filter out the duplicates

here’s how to run a query that gets only unique stuff –

SELECT MAX(id) AS some_id , ppg , ppg1 , MIN(href) AS some_href , MAX(date) AS some_date FROM container GROUP BY ppg , ppg1
i’ve used the column aliases some_* to highlight that the values of these columns might not all be from the same original row

verstehst?

*sigh* Well, no offence but I think you might have to take a walk on this one. If you didn’t understand my last explanation then I’m not even intelligent enough to conceive a more lucid explanation of what the problem is.

I can’t also use your solution because the number of columns is large. I need to select * but only when the condition is met. Think of it like SELECT DISTINCT * in a way.

I expressly said in my OP that I can’t add UNIQUE to those columns because they themselves shouldn’t be unique or distinct. What I need to be unique is a pair. MYSQL does not have IF conditions but if it did, my filter/exclusion function would run ONLY when the values in the pair of columns ppg and ppg1 on a row are already a pair in another row on the same table. I illustrated it with the table in my OP. I said since the values in id 30 columns (PPG and PPG1) are already the same values in id 2 columns (PPG and PPG1), forget about column 30. If you don’t understand me past that then thanks for trying to help and have a nice evening.

dude… please read my last reply again

What I need to be unique is a pair – this is exactly what i gave you

please, do go right ahead and test that, because it won’t do what you think

and as for telling me to take a hike? *plonk*

1 Like

Are you for real man? Of course I know that won’t work that’s why I didn’t try it and I’m posting here instead. I’m giving instances to help you understand what I’m trying to explain. smh.

As for your solution, I tried it and returns 509 rows. Now, all entries for that day are 496 in number and were mistakenly entered twice so instead of removing everything from the index of the first batch of duplicate rows, I want a permanent query that would allow me fetch unique rows in case it recurs in the future. So I guess your query should return 248 results (496/2), if it does work as intended and if the duplication happened just once.

By the way

Please refrain from posting to talk.bizarre until such time as you
cease to be an asshole and become at least one of: bizarre, creative,
or entertaining. You are welcome to dump your rotting ordure in rec.humor
or some similar group where your fellow mental defectives congregate.

:joy: :joy::joy::joy::joy::joy::joy::joy::joy::joy::joy::joy::joy::joy:

Did I not ask you not to take offence?

I have tried to follow the discussion with @r937, and I’ve got to admit that I’m lost as well.

I’m guessing (in laymen’s terms) that there are more fields on the container table than you’ve shown thus far, and that you’re looking to pull all of those fields, but ONLY for the first occurrence of a person/url combination for a specific date. IF I’ve guessed right, then a sub-query which uses a GROUP BY would be an appropriate approach

SELECT id
     , ppg1
     , ppg2
     , href
     , 'date'
  FROM container
 WHERE id IN (SELECT MIN(id)
                   , ppg1
                   , ppg2
                   , href
                FROM container
               WHERE date = '17-09-17'
               GROUP BY ppg1
                      , ppg2
                      , href) sq

Though to be honest, a more generic sub-query would probably serve you better in that you can use it to search by date, href, even names…

SELECT id
     , ppg1
     , ppg2
     , href
     , 'date'
  FROM container
 WHERE id IN (SELECT MIN(id)
                   , ppg1
                   , ppg2
                   , href
                   , 'date'
                FROM container
               GROUP BY ppg1
                      , ppg2
                      , href
                      , 'date') sq

Hey. So sorry about the delay in response. My phone fell yesterday evening and its screen shattered to pieces.

I tried your solution but it didn’t work. Idk if the ‘sq’ at the end of your queries is a typo because it results in an error over here. When I then omit the sq and try to run the query, I get another error

#1241 - Operand should contain 1 column(s)

Thank you for trying to be of help.

#EDIT

Yes. There are more than 80 columns per row. And yes, your understanding is correct. The data should only be regarded as duplicate when the pair or combination of those columns occur elsewhere.

Ugh. Sorry about that. Rookie mistake.

This didn’t work because I was an idiot (the sub-query would return many values which throws the IN off). Try this format instead…

SELECT c.id
     , c.ppg1
     , c.ppg2
     , c.href
     , c.'date'
     , c.field1
     , c.field2
     , c.fieldx
  FROM container C
 JOIN (SELECT MIN(id) AS MinID
            , ppg1
            , ppg2
            , href
            , 'date'
         FROM container
        GROUP BY ppg1
               , ppg2
               , href
               , 'date') sq ON sq.MinID = c.id
1 Like

Corrected this

to c.date and it still returned 509 results.

You know what guys? I really appreciate the effort. If the nut refuses to crack, let’s let it be. At least I have a solution to tackle it in PHP. If it’s either impossible or ambiguous to achieve in MYSQL, there’s no biggie. Thank you all once again.

#EDIT
Is there a way to delete all the duplicate rows so I can set the href column to unique? I’ve just realized that some rows inserted as many as 4 times and since the id column is auto increment, they slid in undetected. How can I now spot all occurrences of duplicate entries and subsequently delete them?

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.