I’m having trouble figuring out what query would return all rows only when two columns preexist. I can’t set them both to UNIQUE because I’m expecting the operands to appear in other variants but not as a pair. Something like this
I want to select rows with id 2 and 8 only. I tried
SELECT t1.* FROM container AS t1 INNER JOIN container AS t2 ON t1.href = t2.href AND t1.ppg > t2.ppg AND t1.ppg1 > t2.ppg WHERE t2.id IS NULL AND t1.date = '17-09-17'
But this just fetches all the rows as is. I was hoping it would select all entries on both tables with matching hrefs and merge them into a new table but preserving those with greater IDs.
I was able to achieve the desired effect on PHP but for the purpose of separation of concerns, I thought it’d be better to let mysql do its job.
I’m struggling to understand what rule you use to decide that you want the rows with id 2 and 8. You say you want rows 2 and 8, but then earlier you said you wanted to return all rows. And I’m a little unsure why this is in the query
AND t1.ppg > t2.ppg AND t1.ppg1 > t2.ppg
Is the second one supposed to be ppg2? And are there two tables here, or just the one? You mention both tables, but then seem to use the same one in the query.
I can’t quite decide whether I’ve stumbled on a question so complex I can’t understand it, or whether there are typos here. I’m learning db stuff, so it might just be me.
By id 2 and 8, I was implying I want all rows ignoring or with the exception of the duplicate, which in this context is the row with id 30. For the purpose of clarity, I specifically mentioned the rows I want. If I had a million rows, I want a query that’d return all rows on a specific date but without duplicates. The data inserted unfortunately cannot be regulated by a unique key except maybe if I were to use their href column.
I’m using this AND t1.ppg > t2.ppg AND t1.ppg1 > t2.ppg based on my limited knowledge of SQL beyond everyday queries. Like I explained earlier, the intended behavior of those commands is:
select everything in the table and store in two variables t1 and t2
merge rows in both 'tables' based on wherever they have the same href (although column "id" would've been more ideal for this, it didn't work)
out of both tables, migrate rows where their primary key is greater than their counterpart in the 2nd table, to the new table
this should cause stranded holes in the new table like int[] s = new int[5]; s = {76,12}; would result in holes from index 2-4, thus I said “WHERE t2.id IS NULL” in order to track those that failed to make the jump.
But I’m open to any other way the task can be accomplished.
When I say two columns preexist, I depicted that in my example as the data in id 30 and 2 whereby the values in columns row id 30 column ppg and ppg1 preexist. That pair is not distinct. That pair is what I’m interested in its distinctiveness. Individually, they can both appear alongside other “partners” i.e. ppg = Doe and ppg1 = John or ppg = John and ppg1 = anything else besides Doe since that pair already exists. Capish?
if you want to avoid duplicates of { ppg , ppg1 } then the best way to do that is by preventing them from entering the table
this is accomplished by
ALTER TABLE container ADD UNIQUE ( ppg , ppg1 )
until you do that, you will always have to try to filter out the duplicates
here’s how to run a query that gets only unique stuff –
SELECT MAX(id) AS some_id
, ppg
, ppg1
, MIN(href) AS some_href
, MAX(date) AS some_date
FROM container
GROUP
BY ppg
, ppg1
i’ve used the column aliases some_* to highlight that the values of these columns might not all be from the same original row
*sigh* Well, no offence but I think you might have to take a walk on this one. If you didn’t understand my last explanation then I’m not even intelligent enough to conceive a more lucid explanation of what the problem is.
I can’t also use your solution because the number of columns is large. I need to select * but only when the condition is met. Think of it like SELECT DISTINCT * in a way.
I expressly said in my OP that I can’t add UNIQUE to those columns because they themselves shouldn’t be unique or distinct. What I need to be unique is a pair. MYSQL does not have IF conditions but if it did, my filter/exclusion function would run ONLY when the values in the pair of columns ppg and ppg1 on a row are already a pair in another row on the same table. I illustrated it with the table in my OP. I said since the values in id 30 columns (PPG and PPG1) are already the same values in id 2 columns (PPG and PPG1), forget about column 30. If you don’t understand me past that then thanks for trying to help and have a nice evening.
Are you for real man? Of course I know that won’t work that’s why I didn’t try it and I’m posting here instead. I’m giving instances to help you understand what I’m trying to explain. smh.
As for your solution, I tried it and returns 509 rows. Now, all entries for that day are 496 in number and were mistakenly entered twice so instead of removing everything from the index of the first batch of duplicate rows, I want a permanent query that would allow me fetch unique rows in case it recurs in the future. So I guess your query should return 248 results (496/2), if it does work as intended and if the duplication happened just once.
By the way
Please refrain from posting to talk.bizarre until such time as you
cease to be an asshole and become at least one of: bizarre, creative,
or entertaining. You are welcome to dump your rotting ordure in rec.humor
or some similar group where your fellow mental defectives congregate.
I have tried to follow the discussion with @r937, and I’ve got to admit that I’m lost as well.
I’m guessing (in laymen’s terms) that there are more fields on the container table than you’ve shown thus far, and that you’re looking to pull all of those fields, but ONLY for the first occurrence of a person/url combination for a specific date. IF I’ve guessed right, then a sub-query which uses a GROUP BY would be an appropriate approach
SELECT id
, ppg1
, ppg2
, href
, 'date'
FROM container
WHERE id IN (SELECT MIN(id)
, ppg1
, ppg2
, href
FROM container
WHERE date = '17-09-17'
GROUP BY ppg1
, ppg2
, href) sq
Though to be honest, a more generic sub-query would probably serve you better in that you can use it to search by date, href, even names…
SELECT id
, ppg1
, ppg2
, href
, 'date'
FROM container
WHERE id IN (SELECT MIN(id)
, ppg1
, ppg2
, href
, 'date'
FROM container
GROUP BY ppg1
, ppg2
, href
, 'date') sq
Hey. So sorry about the delay in response. My phone fell yesterday evening and its screen shattered to pieces.
I tried your solution but it didn’t work. Idk if the ‘sq’ at the end of your queries is a typo because it results in an error over here. When I then omit the sq and try to run the query, I get another error
#1241 - Operand should contain 1 column(s)
Thank you for trying to be of help.
#EDIT
Yes. There are more than 80 columns per row. And yes, your understanding is correct. The data should only be regarded as duplicate when the pair or combination of those columns occur elsewhere.
You know what guys? I really appreciate the effort. If the nut refuses to crack, let’s let it be. At least I have a solution to tackle it in PHP. If it’s either impossible or ambiguous to achieve in MYSQL, there’s no biggie. Thank you all once again.
#EDIT
Is there a way to delete all the duplicate rows so I can set the href column to unique? I’ve just realized that some rows inserted as many as 4 times and since the id column is auto increment, they slid in undetected. How can I now spot all occurrences of duplicate entries and subsequently delete them?