Dynamic posts load the user’s data such as sigs with a join.
Static posts include it as data with the post row itself so no joins are necessary.
In general is it more important to avoid additional joins by, for example, caching data, or to make the databases smaller and therefore faster to search?
Priority is performance and ability to handle a database of arbitrarily huge volume and frequent access as expected of a huge wiki or social network.
in general it is more important to design the data in a manner appropriate to its inherent relationships, ensuring proper normalization, so all unnecessary joins are avoided not by caching but by eliminating needless redundancies
size of the database matters little if the tables are properly optimized with indexes that are appropriate to the queries being run
The inherent relationship is what I am trying to decide. Sigs can be considered both user and post data for example.
I can go with either dynamic or static and would prefer whichever is efficient, but do not know if efficiency means smaller rows (static sigs could more than double the database, but they will be TEXT and thus stored independently so maybe it doesn’t matter?) or avoiding joins?
I don’t understand how normalization avoids joins (beyond 3NF). If you want to show the number of posts by a user isn’t it less joining to cache the number of posts in each post?
SELECT [fields including sigs] FROM posts WHERE id = x (id primary key)
SELECT [fields, userinfo] FROM posts,users WHERE id = x AND userid=userid (both ids primary keys)
oh, and as for sigs being both user and post data, this forum (vbulletin) regards it as user data… i can look up an old post of mine here from years ago, and it has today’s sig
For example, vBulletin does the latter. I think (if I remember correctly), phpBB does the former.
My personal approach (which I’ve been considering since I’m developing my own forum software because vB makes me rage =p), I think I’m going to take the middle row.
I’m going to store them separately, but then maintain a cache in which they are joined. You wind up with a few pros and cons. Pros of this approach is you (almost) always get accurate data AND you get quick results. The cons are that you have to either a) rebuild the a portion of the cache that had data changed (like when a user changes their signature or b) rebuild the cache on a regular basis.
Is text data any more performance intensive than ints, due to its independent nature?
For example if you query a db of millions of rows that are maybe 10kb but have about 50kb of text is it just as fast as a 10kb row with no text, as long as the text is not searched? What if it is returned but but not searched?
my advice is: don’t worry about performance at this point
Well, I am currently in a refactoring mode. The program involves lots of such data and I’m more of a programmer than a database type so I want to learn how to make a database faster. I know some of the techniques (nested sets) but I am not a database expert.
According to MYSQL’s site: Each BLOB or TEXT value is represented internally by a separately allocated object. This is in contrast to all other data types, for which storage is allocated once per column when the table is opened. In other words, if a TEXT is only about the size of an INT for row size purposes, would it be as fast as querying a row with just an INT? However after searching again I learned that the implementation is specific to a certain type of MYSQL database and may not be relevant to what I am using.
I asked because of the possibility that row size or database size may be relevant to performance, but I do not know if it is.
Refactoring is just a buzz word for recoding. Essentially the board is already working but I am trying to make it more efficient and OOP.
Somewhat related, I wonder how I may efficiently return a set of sorted rows as a composite row of the most relevant values.
For example, I want a single set of options from a table:
options table
option1 option2 option3
These are the rows sorted:
A | 0 | 0
X | B | 0
X | X | C
I would return the first non zero or non null that is found of each of the fields: