I’m trying to retrieve 2 Averages from records stored in a table of “Business Reviews”. The businesses are reviewed in several categories such as “service,” “cleanliness,” etc… The scores are based on a scale between 1 - 5. The reviews are then compiled based on category into a table with columns structured as:
business_id, question_category, R1, R2, R3, R4, R5
R1-R5 of course being how many people gave the business a score of 1, or 2, or 3…
I first need to total and average the business’s scores per category. I then need to see what percentage of that category score was made up by people rating them at the highest possible level of “5”
My query works, but I’m sure there’s a cleaner more efficient way. Perhaps a derived table or a self-join would be of use.
Here’s my code:
select business_id, category_id, avg(r1 + r2 + r3 + r4 + r5) as total_avg,
(sum(r5) / (sum(r1) + sum(r2) + sum(r3) + sum(r4) + sum(r5)) * 100) as top_score_avg
group by business_id, category_id
Sorry for the long winded, convoluted question.
you can replace the 5 sums in the denominator of top_score_avg with one –
100.0 * SUM(r5) / SUM(r1 + r2 + r3 + r4 + r5) AS top_score_avg
note that i used 100.0 instead of 100
i suspect your calculation using 100 will yield a top_score_average of zero
Thanks. I’ll take another look at the arithmetic, and eliminating the multiple calls to SUM is helpful as well.
Is there a way to avoid having to twice call aggregate functions on R1-R5 and three times on R5?
I tried creating a temp table to aggregate a total sum that I could use elsewhere in the query, but MySQL wasn’t allowing it. Even if I could get it to work, I’d need to call GROUP BY on both the sub query and the parent query which would impose its own tax on performance.
i don’t think this will buy you that many extra milliseconds, but here ya go, only minimal aggregate calls …
, ( sum_r1 + sum_r2 + sum_r3 + sum_r4 + sum_r5 ) /
rows AS total_avg
, 100.0 * sum_r5 /
( sum_r1 + sum_r2 + sum_r3 + sum_r4 + sum_r5 ) AS top_score_avg
FROM ( SELECT business_id
, SUM(r1) AS sum_r1
, SUM(r2) AS sum_r2
, SUM(r3) AS sum_r3
, SUM(r4) AS sum_r4
, SUM(r5) AS sum_r5
, COUNT(*) AS rows
, category_id ) AS temp