SQL COUNT - some correct values, some wrong values

I am using COUNT(ScoreID) as scoreCount to display the number of Scores a property has in a vote.

Is there any reason at all why the results would sometimes display the correct value, but other display an incorrect value?

I have noticed that for some, if I add a new record to the scores table, the output on the page using the COUNT(ScoreID) as scoreCount is sometimes correct, and at others wrong - usually by multiples of the correct figure.

So for example, I can have two results:

Lodge A, 30 votes (incorrect - it should be 10) Lodge B, 30 votes (correct)

If I add a new vote to Lodge A, and a new vote to Lodge B in the table, that output changes to:

Lodge A, 33 votes (incorrect - it should be 11) Lodge B, 31 votes (correct)

Any suggestions where to look?

look at your query

“usually by multiples of the correct figure” means you probably have two one-to-many relationships joined to the same table

Thanks - that’s definitely it. Looking at it a bit more I realized it was happening when new records were getting added to the nominations table.

Its all to do with this earlier post:

The query I currently have looks like:

SELECT ScoreID, COUNT(ScoreID) as scoreCount, AVG(Score) as AverageScore, lodges.LodgeID, lodges.Lodge, Country, lodges.CountryID, categoriesFull.CategoryID, nominations.NominationID, SW_finalist, SW_rank, 2015_Awards_Participant FROM lodges 
INNER JOIN countries ON lodges.CountryID = countries.CountryID 
INNER JOIN scores ON lodges.LodgeID = scores.LodgeID 
INNER JOIN nominations ON lodges.LodgeID=nominations.LodgeID 
INNER JOIN categoriesFull ON nominations.CategoriesFullID = categoriesFull.CategoryID AND scores.categoryID = categoriesFull.CategoryID
WHERE lodges.CountryID = 3 AND scores.CategoryID=7 GROUP BY Lodge

Which is working in that it is returning the correct values, but is multiplying the scoreCount value, seemingly by the number of records in the nominations table.

Can you see what might fix it?

not without a complete understanding of the cardinalities of the relationships

perhaps you could identify the PKs and FKs in every table?

They should be indicated in brackets in the list of tables in the other post?

Tables are:

lodges

LodgeID (PK)
Lodge
etc

scores

ScoreID (PK)
Score
CategoryID (FK)
LodgeID (FK)
(also used to calculate ScoreCount and AverageScore)

categoriesFull

CategoryID (PK)
Category

countries

CountryID (PK)
Country

nominations

NominationID (PK)
LodgeID (FK)
CategoriesFullID (FK) (–to categoriesFull.CategoryID)
SW_finalist
SW_rank

So if I have the following:

lodges
LodgeID, Lodge
798, Borana Lodge

scores
ScoreID, LodgeID, CategoryID, Score
1001, 798, 7, 3
1002, 798, 7, 6
1003, 798, 7, 9

categoriesFull
CategoryID
7

countries
CountryID
3

nominations
NominationID, LodgeID, CategoryID, SW_finalist, SW_rank
1234, 798, 7, Yes, 2

I’d like to be able to output

LodgeID, Lodge, CategoryID, CountryID, ScoreCount, AverageScore, NominationID, SW_finalist, SW_rank
798, Borana Lodge, 7, 3, 3, 6, 1234, Yes, 2

so a single lodge presumably belongs to only one country

and a single lodge can have a score in more than one category

and a single lodge can be nominated in more than one category

even though you’re joining only to categories which are common to both scores and nominations, i think that’s your problem right there – you’re still getting cross-multiples

put up some sample data for counts that you know are wrong, and i’ll show you

p.s. your GROUP BY is inadequate – it should really contain every non-aggregate column in your SELECT clause

Thanks - I really appreciate the help with this one.

The background is this…

Any given lodge will only be in one country.

Any given lodge can receive multiple votes in any given category.

But any given lodge can only receive one nomination in any given category, in any given country (for any given year).

So for example a Lodge might be

LodgeID, Lodge, CountryID
798, Borana Lodge, 1

With countries:

CountryID, Country
1, Kenya

With scores

ScoreID, LodgeID, CategoryID, Score
1, 798, 1, 3
2, 798, 1, 6
3, 798, 1, 9
4, 798, 2, 8
5, 798, 2, 8
6, 798, 2, 8
7, 798, 3, 8
8, 798, 3, 9
9, 798, 3, 10
etc

With categories

CategoryID, Category
1, Best Value
2, Best Cuisine
3, Best Walking

With nominations:

NominationID, CategoryID, SW_finalist, SW_rank
1, 3, Yes, 1
2, 6, Yes, 5
3, 9, Yes, 10

So Borana Lodge is in Kenya, has had three votes each in Best Value, Best Cuisine and Best Walking, and has had one record added to the nomination for each of those categories.

And I’d like to output from that:

LodgeID, Lodge, CategoryID, CountryID, ScoreCount, AverageScore, NominationID, SW_finalist, SW_rank
798, Borana Lodge, 1, 1, 3, 6, 1, Yes, 1
798, Borana Lodge, 2, 1, 3, 8, 2, Yes, 5
798, Borana Lodge, 3, 1, 3, 9, 3, Yes, 10

It arguably is slightly convoluted with scores and nominations, but the idea is that scores records all the individual votes, which can obviously be many for any given category.

But nominations just needs one record per lodge for any given category in any given country. And then additional fields in the nominations table can be used to record whether the lodge is a finalist, runner up or winner in that category for that country. SW_finalist and SW_rank fields above are for judges to flag a loge as a potential finalist and give it rank of 1-10 if they think it makes the top 10.

Hope that makes sense - it does feel like its pretty much there apart form this multiple counting issue.

Thanks again.

This SQL returns two results where I know one is correct, and one is wrong:

SELECT ScoreID, COUNT(ScoreID) as scoreCount, AVG(Score) as AverageScore, lodges.LodgeID, lodges.Lodge, Country, lodges.CountryID, categoriesFull.CategoryID, nominations.NominationID, SW_finalist, SW_rank, 2015_Awards_Participant FROM lodges INNER JOIN countries ON lodges.CountryID = countries.CountryID INNER JOIN scores ON lodges.LodgeID = scores.LodgeID INNER JOIN nominations ON lodges.LodgeID=nominations.LodgeID INNER JOIN categoriesFull ON nominations.CategoriesFullID = categoriesFull.CategoryID AND scores.categoryID = categoriesFull.CategoryID WHERE lodges.CountryID = 3 AND scores.CategoryID=7 AND (scores.LodgeID = 395 OR scores.LodgeID = 1296) GROUP BY Lodge

Gives:

3436, 30, 9.1, 395, Kicheche Laikipia Camp, Kenya, 3, 7, 2026 etc
7103, 30, 9.3557, 1296, Sweetwaters Serena Camp, Kenya, 3, 7, 3518 etc

Kicheche is incorrect - it has only had ten votes in that category.
Sweetwater is correct - it has had 30 votes.

I can see now that if a lodge has received votes in more than one category, the scoreCount in the above query is being multiplied by the number of categories the lodge has received votes in.

Which explains why some looked OK - if they had only received votes in one category it was showing scoreCount x 1 which is correct.

But if it has received votes in four categories it was showing scoreCount x 4.

Having said all of that, I don’t know how to fix it in that query.

I got quite close by adding NominationID to the GROUP BY clause. Seemed to make sense from what you said earlier, and also as it looked like the Nominations were what was causing the scoreCount to multiply.

I have a test sample in phpMyAdmin which has the score count correct, and shows the correct values for SW_finalist and SW_rank…

…but two records appear to be missing from the list out of 50.

OK - I think this is working now - the two missing ones didn’t have a record in the nominations table corresponding to their respective records in the scores table. The winner looks like:

SELECT ScoreID, COUNT(ScoreID) as scoreCount, AVG(Score) as AverageScore, lodges.LodgeID, lodges.Lodge, Country, lodges.CountryID, categoriesFull.CategoryID, nominations.NominationID, SW_finalist, SW_rank, 2015_Awards_Participant, Year
FROM lodges
INNER JOIN countries ON lodges.CountryID = countries.CountryID
INNER JOIN scores ON lodges.LodgeID = scores.LodgeID
INNER JOIN nominations ON lodges.LodgeID=nominations.LodgeID
INNER JOIN categoriesFull ON nominations.CategoriesFullID = categoriesFull.CategoryID AND scores.categoryID = categoriesFull.CategoryID WHERE lodges.CountryID = 3 AND scores.CategoryID=7 AND Year = 2015 AND Nominee = ‘Y’
GROUP BY Lodge, NominationID

if you’re happy, then that’s all that matters

if this were my project, i’d have a hard look at your FKs to make sure dupes are not possible

i’d also completely rewrite your SELECT and GROUP BY clauses – for example, there are multipe scores per lodge, so which score do you expect show for a lodge? (first column in SELECT clause)

I’m happy that its working, but if there is anything that should be changed to make it more robust / correct then I’d certainly want to do that. This is where this stuff becomes hard to learn sometimes - something can look correct, and seem to be working, but actually not be right, or at least not as robust as it could be.

With the score, I don’t actually want to show any single score, I just want to show the number of scores (i.e. the number of people who voted in for that lodge in that category), and the average score. So my understanding is that without the GROUP BY it would return a record for all each score. But GROUP BY just shows one, which is all I need as it will have the number of votes and average score.

okay, definitely prune your SELECT clause then