Dynamic Vs. Static

FreddyG · March 14, 2011, 11:47pm

Dynamic posts load the user’s data such as sigs with a join.

Static posts include it as data with the post row itself so no joins are necessary.

In general is it more important to avoid additional joins by, for example, caching data, or to make the databases smaller and therefore faster to search?

Priority is performance and ability to handle a database of arbitrarily huge volume and frequent access as expected of a huge wiki or social network.

r937 · March 15, 2011, 12:22am

neither

in general it is more important to design the data in a manner appropriate to its inherent relationships, ensuring proper normalization, so all unnecessary joins are avoided not by caching but by eliminating needless redundancies

size of the database matters little if the tables are properly optimized with indexes that are appropriate to the queries being run

did that answer your question?

FreddyG · March 15, 2011, 3:26am

The inherent relationship is what I am trying to decide. Sigs can be considered both user and post data for example.

I can go with either dynamic or static and would prefer whichever is efficient, but do not know if efficiency means smaller rows (static sigs could more than double the database, but they will be TEXT and thus stored independently so maybe it doesn’t matter?) or avoiding joins?

I don’t understand how normalization avoids joins (beyond 3NF). If you want to show the number of posts by a user isn’t it less joining to cache the number of posts in each post?

SELECT [fields including sigs] FROM posts WHERE id = x (id primary key)

SELECT [fields, userinfo] FROM posts,users WHERE id = x AND userid=userid (both ids primary keys)

That’s the basic query so indexes are there.

r937 · March 15, 2011, 10:25am

okay, i think i see what you’re talking about

my advice is: don’t worry about performance at this point

(do a search for “premature optimization is…”)

play it straight, use joins, don’t cache things like the count of a user’s posts in the user’s data

there are many ways of optimizing retrieval, don’t right away assume that you need to out-think the database engine

r937 · March 15, 2011, 10:28am

oh, and as for sigs being both user and post data, this forum (vbulletin) regards it as user data… i can look up an old post of mine here from years ago, and it has today’s sig

samanime · March 15, 2011, 3:02pm

Well, it depends.

For example, vBulletin does the latter. I think (if I remember correctly), phpBB does the former.

My personal approach (which I’ve been considering since I’m developing my own forum software because vB makes me rage =p), I think I’m going to take the middle row.

I’m going to store them separately, but then maintain a cache in which they are joined. You wind up with a few pros and cons. Pros of this approach is you (almost) always get accurate data AND you get quick results. The cons are that you have to either a) rebuild the a portion of the cache that had data changed (like when a user changes their signature or b) rebuild the cache on a regular basis.

samanime · March 15, 2011, 3:03pm

Really?

It must be a setting somewhere in that mess vB calls their AdminCP. On a forum I manage it stores the signature data as part of the post.

FreddyG · March 16, 2011, 9:52am

Is text data any more performance intensive than ints, due to its independent nature?

For example if you query a db of millions of rows that are maybe 10kb but have about 50kb of text is it just as fast as a 10kb row with no text, as long as the text is not searched? What if it is returned but but not searched?

my advice is: don’t worry about performance at this point

Well, I am currently in a refactoring mode. The program involves lots of such data and I’m more of a programmer than a database type so I want to learn how to make a database faster. I know some of the techniques (nested sets) but I am not a database expert.

r937 · March 16, 2011, 10:19am

independent nature?

look, you can’t design based on generalities like this

to give you an idea of how a generality sounds, suppose i said “do not go outside without sunscreen because you could develop melanomas”

how does this compare to “do not use text data because it is not as efficient as numeric data”

also, i am sorry, i’m an old guy who never learned modern programming techniques, so i’m afraid i must admit that i have no idea what “refactoring” is

did you look up “premature optimization is…”?

FreddyG · March 17, 2011, 3:08am

According to MYSQL’s site: Each BLOB or TEXT value is represented internally by a separately allocated object. This is in contrast to all other data types, for which storage is allocated once per column when the table is opened. In other words, if a TEXT is only about the size of an INT for row size purposes, would it be as fast as querying a row with just an INT? However after searching again I learned that the implementation is specific to a certain type of MYSQL database and may not be relevant to what I am using.

I asked because of the possibility that row size or database size may be relevant to performance, but I do not know if it is.

Refactoring is just a buzz word for recoding. Essentially the board is already working but I am trying to make it more efficient and OOP.

Somewhat related, I wonder how I may efficiently return a set of sorted rows as a composite row of the most relevant values.

For example, I want a single set of options from a table:

options table
option1 option2 option3

These are the rows sorted:

A | 0 | 0
X | B | 0
X | X | C

I would return the first non zero or non null that is found of each of the fields:

option1: A option2: B option3: C