Sorting: Calculate Value On-the-Fly vs. Storing Value

My database has the following (simplified) relationship…


comment -||-----0<- review

In the COMMENT table, is information about - you guessed it - the Commenter and the Comment.

In the REVIEW table, is information about how the Reviewer rated the Comment (e.g. Helpful (Y/N), Rating, etc.)

Currently, I have created some PHP Functions which calculate aggregate figures (“% Found Helpful”, “Avg Rating”, etc.) and display them. Unfortunately, these won’t help me with my next challenge…

I would like to expand Sorting to allow users to sort by things like “Most Helpful Comment” and “Avg Rating (Highest-to-Lowest)”.

As far as I can see, there are two approaches I could take…

Option #1: Add a Subquery to my existing Comment query which would calculate the Aggregate Values on-the-fly.

Problem is, this seems “expensive”.

Option #2: Every time a user reviews a Comment, I could calculate the new Aggregate Values and write them to my Comment table.

But this also seems “expensive” and it suffers from the fact that I would be storing Calculated Values in my Comments table which seems like a really poor choice.

There could be other ways to do this as well, but this is a problem which I have never encountered before.

Help and suggestions welcome!! :blush:

Sincerely,

Debbie

It depends on the ratio of reads to writes.

Normalising a database means that the calculations get done when you read from the database. The other benefit to this is consistency.

When doing the calculations at this point becomes too expensive is the point at which you undo some of the normalising and start storing the derivative data. This switches the cost from the read to the write (and also opens up the possibility of the data becoming inconsistent).

The closer to holding static data that the database is, the more likely that undoing normalisations will improve efficiency with regard to derivative data by reducing the number of times it gets calculated.

Really, these calculations aren’t so complex that performance will be a big concern. Do it on the fly ( option #1 ).

Even better (probably), just “preload” this by fetching those calculations on page load, and store those values. Then use JS to handle the sorting whenever the user requests it.

no, it does not

you keep using that word… i do not think it means what you think it means

So, Rudy, what would you suggest to my OP?

Is it a big deal to add on a few calculated columns (e.g. “Pct Agreed”, “Pct Helpful”, “Avg Rating”) to my Comments query so that I have values that can be used to sort Comments?

Or should I calculate those values when someone “reviews” a Comment, and include them in an UPDATE to my Comments table?

Or maybe some other approach?

Sincerely,

Debbie

on the fly, as needed

with proper indexing, COUNT() and AVG() queries should perform extremely well

plus, you avoid the complexity of trying to maintain counts (that potentially could get out of whack)

which is exactly how it would have to work with a normalised database.

oh, please

normalization and storing counts are not mutually exclusive

as long as i can provide the value of a unique key, and you can unambiguously return the attributes associated with that key, then the table is normalized

i am not going to debate normalization with you, we’ve done that before (btw, i won)

perhaps it would be better if you stopped using that word and simply discussed the merits of storing-versus-on-the-fly totals, as there is some fertile ground for discussion there

Okay, thanks for the feedback, Rudy.

So, if I need to calculate a value that is not simple like COUNT() or AVG(), what is the best way to do that?

Off Topic:

When all of this first started, I was just displaying these “metric” values next to each Comment, so I created PHP functions to handles things. Below are some snippets…


function getCommentReviewCount(parameters){
		// Build query.
		$q1 = 'SELECT COUNT(id)
						FROM article_comment_review
						WHERE article_id = ?
						AND commenter_id = ?
						AND comment_created_on = ?';
}


function getCommentAgreeCount(parameters){
		// Build query.
		$q1 = 'SELECT COUNT(id)
						FROM article_comment_review
						WHERE article_id = ?
						AND commenter_id = ?
						AND comment_created_on = ?
						AND agree = 1';
}


$commentReviewCount = getCommentReviewCount();

$commentAgreeCount = getCommentAgreeCount();

$commentPctAgree = ($commentAgreeCount / $commentReviewCount) *100;


If I want to instead do the calculations entirely in MySQL, how do I do that?

  • Do I have to write some gigantic query?

  • Can I create a Custom-Function in SQL, and then refer to it in my Comment query?

  • Or can I maybe use my PHP Function, and then refer to that in my Comment query?

Sincerely,

Debbie

with some kind of query, like those ones you just posted