SitePoint Sponsor |
|
User Tag List
Results 1 to 25 of 47
-
May 14, 2007, 22:38 #1
- Join Date
- Jan 2005
- Posts
- 204
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
best way to get total number of rows
i am building something similar to what a forum is like.
so on the list of all the threads, i am going to state the total number of replies on each threads displayed on the list.
if i were to do a query of mysql_num_rows() or count() on the replies table for every single thread to get the total number of replies, is this the right way to do things? i just find that one day when many of the threads has like a few thousand replies then will my query timeout or have some error or will cause the page to load very slowly?
-
May 14, 2007, 23:26 #2
- Join Date
- Aug 2000
- Location
- Philadephia, PA
- Posts
- 20,578
- Mentioned
- 1 Post(s)
- Tagged
- 0 Thread(s)
Maintain a reply count in the threads table. A little denormalization for the sake of huge performance gains. You don't want to be counting up thousands of items on every request for a potentially high usage app like a forum.
Try Improvely, your online marketing dashboard.
→ Conversion tracking, click fraud detection, A/B testing and more
-
May 15, 2007, 00:37 #3
- Join Date
- Jan 2005
- Posts
- 204
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
if i were to add a field called totalreplies into the threads table instead of creating another table, would it be the same as creating a new table just to maintain the reply count?
-
May 15, 2007, 00:50 #4
- Join Date
- Jun 2004
- Location
- Copenhagen, Denmark
- Posts
- 6,157
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
You can issue a query to count the number of rows.
Assuming that your table looks like this:
Code:create table replies ( reply_id serial, thread_id integer, reply_text text )
Code:select count(reply_id) from replies group by thread_id;
-
May 15, 2007, 01:19 #5
- Join Date
- Jan 2005
- Posts
- 204
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
but if one thread has like more than a thousand replies, will it slow down or even timeout by doing a count on all the rows?
on the thread index, it has a lot of threads. imagine counting the rows for all the threads, will it cause problems?
-
May 15, 2007, 01:29 #6
- Join Date
- Apr 2004
- Location
- germany
- Posts
- 4,324
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
mysql doesn't "count" rows in a literal sense, it just pulls the number out of the index file. This is very fast even if you have millions of records.
-
May 15, 2007, 01:48 #7
- Join Date
- Jan 2005
- Posts
- 204
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
oh...then what about mysql_num_rows() in php? does it do the same thing as the count function in the sql statement?
-
May 15, 2007, 02:09 #8
- Join Date
- Jun 2004
- Location
- Copenhagen, Denmark
- Posts
- 6,157
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
-
May 15, 2007, 03:14 #9
- Join Date
- May 2007
- Location
- The Netherlands
- Posts
- 282
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Even for simple queries like "SELECT COUNT(*) FROM aTable GROUP BY aColumn" can't be retrieved from the index, the only possible case where you can say 'for sure' (I'll remove the quotes if you can find a reference in the MySql documentation saying it does) that it retrieves the number from the index is with a 'SELECT COUNT(*) FROM aTable' which isn't the most common query anyway. A more optimal solution would be to cache the data somewhere in your database because a wrong COUNT can be a huge hog on performance.
-
May 15, 2007, 03:28 #10
- Join Date
- Jun 2004
- Location
- Copenhagen, Denmark
- Posts
- 6,157
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
MySql does a lot of caching, so if you issue a count query many times, or perhaps follow it by a select with the same criteria, it may not be such a big problem. Before optimising on a database, you really need to profile the application first.
That said, count() may not be efficient under all circumstances, and then you could apply something like Dan Grossman suggested. You just shouldn't do it, until you have actually identified a performance bottleneck.
-
May 15, 2007, 03:34 #11
- Join Date
- Apr 2004
- Location
- germany
- Posts
- 4,324
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
-
May 15, 2007, 06:35 #12
- Join Date
- May 2007
- Location
- The Netherlands
- Posts
- 282
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
I think you're talking about the query cache, which only results the execution path of the query, not the resultset.
Sorry it should have been a WHERE clause, MyISAM and InnoDB are notoriously slow in that case. (InnoDB is anyway for a COUNT query).
-
May 15, 2007, 08:18 #13
- Join Date
- Jan 2005
- Posts
- 204
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
what are some of the methods that some big apps are using? say for this vBulletin forum, how are they counting our total posts the total posts in each thread?
-
May 15, 2007, 09:47 #14
- Join Date
- Aug 2000
- Location
- Philadephia, PA
- Posts
- 20,578
- Mentioned
- 1 Post(s)
- Tagged
- 0 Thread(s)
You're confusing indexes with table statistics stored by MyISAM tables. Selecting a COUNT(*) from an entire table takes constant time on a MyISAM table because this is stored in the table statistics and neither the table nor an index needs to be examined. A COUNT that is completely covered by an index won't take constant time but will still be much faster than if it had to scan the table instead.
Try Improvely, your online marketing dashboard.
→ Conversion tracking, click fraud detection, A/B testing and more
-
May 15, 2007, 09:50 #15
- Join Date
- Aug 2000
- Location
- Philadephia, PA
- Posts
- 20,578
- Mentioned
- 1 Post(s)
- Tagged
- 0 Thread(s)
Try Improvely, your online marketing dashboard.
→ Conversion tracking, click fraud detection, A/B testing and more
-
May 15, 2007, 09:54 #16
- Join Date
- Sep 2004
- Location
- NY, USA
- Posts
- 712
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
-
May 16, 2007, 23:31 #17
- Join Date
- Jan 2005
- Posts
- 204
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
thanks guys!
by the way, where is the index and index file you guys were talking about? i am not sure what is it.
-
May 16, 2007, 23:38 #18
- Join Date
- Aug 2000
- Location
- Philadephia, PA
- Posts
- 20,578
- Mentioned
- 1 Post(s)
- Tagged
- 0 Thread(s)
You *can't* design a database for a real world application without knowing what indexes are, when to use them, and how to use them. Time to study up
http://dev.mysql.com/doc/refman/5.0/...l-indexes.htmlTry Improvely, your online marketing dashboard.
→ Conversion tracking, click fraud detection, A/B testing and more
-
May 19, 2007, 00:05 #19
- Join Date
- May 2007
- Location
- San Francisco, California
- Posts
- 18
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
my own .02:
Making up some numbers for the sake of an example..
If I had/was expecting on average like 50-75 replies to each post, I'd store the reply count in the thread record as a column. If I had like 10,000 threads with maybe 2-3 replies, I'd just do a select count() sql query as mentioned above.
I'd be interested in what is faster though...
I'm sure its something that is *technically* rdbms-specific, but from past experience, SELECT count(*) FROM table is pretty quick, even on tables with over 100k rows. My personal experience is MSSQL, but have had similar results with postgres (don't really use MySQL[anymore])
-
May 19, 2007, 00:22 #20
- Join Date
- Aug 2000
- Location
- Philadephia, PA
- Posts
- 20,578
- Mentioned
- 1 Post(s)
- Tagged
- 0 Thread(s)
There's no question about what's going to be faster. You're already retrieving the rows from the threads table to display the list of threads. If you stored the reply count in that table, retrieving an additional column in the same query adds nothing. Not maintaining this count means *at the very least* scanning an index on the posts table; you're retrieving a count grouped by thread_id, not just an unbounded count(*).
Try Improvely, your online marketing dashboard.
→ Conversion tracking, click fraud detection, A/B testing and more
-
May 19, 2007, 02:24 #21
- Join Date
- Apr 2004
- Location
- germany
- Posts
- 4,324
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
-
May 19, 2007, 02:31 #22
- Join Date
- Aug 2000
- Location
- Philadephia, PA
- Posts
- 20,578
- Mentioned
- 1 Post(s)
- Tagged
- 0 Thread(s)
Try Improvely, your online marketing dashboard.
→ Conversion tracking, click fraud detection, A/B testing and more
-
May 19, 2007, 02:45 #23
- Join Date
- Apr 2004
- Location
- germany
- Posts
- 4,324
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Yeah, sure... normalized data is for sissies
-
May 19, 2007, 03:33 #24
- Join Date
- Jul 2002
- Location
- Toronto, Canada
- Posts
- 39,347
- Mentioned
- 63 Post(s)
- Tagged
- 3 Thread(s)
normalization does not mean eliminating "redundant" data like a count that is calculated elsewhere and then stored in a parent table
normalization deals with the dependence of the attribute on the primary key, so the redundant count is actually not denormalized
normalization is not for sissies, it is for everyone
-
May 19, 2007, 10:05 #25
- Join Date
- May 2007
- Location
- San Francisco, California
- Posts
- 18
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Bookmarks