SitePoint Sponsor

User Tag List

Results 1 to 10 of 10
  1. #1
    SitePoint Member
    Join Date
    Feb 2011
    Posts
    7
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Dynamic Vs. Static

    Dynamic posts load the user's data such as sigs with a join.

    Static posts include it as data with the post row itself so no joins are necessary.

    In general is it more important to avoid additional joins by, for example, caching data, or to make the databases smaller and therefore faster to search?

    Priority is performance and ability to handle a database of arbitrarily huge volume and frequent access as expected of a huge wiki or social network.

  2. #2
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,336
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by FreddyG View Post
    In general is it more important to avoid additional joins by, for example, caching data, or to make the databases smaller and therefore faster to search?.
    neither

    in general it is more important to design the data in a manner appropriate to its inherent relationships, ensuring proper normalization, so all unnecessary joins are avoided not by caching but by eliminating needless redundancies

    size of the database matters little if the tables are properly optimized with indexes that are appropriate to the queries being run

    did that answer your question?

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  3. #3
    SitePoint Member
    Join Date
    Feb 2011
    Posts
    7
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The inherent relationship is what I am trying to decide. Sigs can be considered both user and post data for example.

    I can go with either dynamic or static and would prefer whichever is efficient, but do not know if efficiency means smaller rows (static sigs could more than double the database, but they will be TEXT and thus stored independently so maybe it doesn't matter?) or avoiding joins?

    Quote Originally Posted by r937 View Post
    in general it is more important to design the data in a manner appropriate to its inherent relationships, ensuring proper normalization, so all unnecessary joins are avoided not by caching but by eliminating needless redundancies

    size of the database matters little if the tables are properly optimized with indexes that are appropriate to the queries being run
    I don't understand how normalization avoids joins (beyond 3NF). If you want to show the number of posts by a user isn't it less joining to cache the number of posts in each post?

    SELECT [fields including sigs] FROM posts WHERE id = x (id primary key)

    SELECT [fields, userinfo] FROM posts,users WHERE id = x AND userid=userid (both ids primary keys)

    That's the basic query so indexes are there.

  4. #4
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,336
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    okay, i think i see what you're talking about

    my advice is: don't worry about performance at this point

    (do a search for "premature optimization is...")

    play it straight, use joins, don't cache things like the count of a user's posts in the user's data

    there are many ways of optimizing retrieval, don't right away assume that you need to out-think the database engine

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  5. #5
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,336
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    oh, and as for sigs being both user and post data, this forum (vbulletin) regards it as user data... i can look up an old post of mine here from years ago, and it has today's sig
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  6. #6
    SitePoint Wizard
    Join Date
    Dec 2003
    Location
    USA
    Posts
    2,582
    Mentioned
    29 Post(s)
    Tagged
    0 Thread(s)
    Well, it depends.

    For example, vBulletin does the latter. I think (if I remember correctly), phpBB does the former.

    My personal approach (which I've been considering since I'm developing my own forum software because vB makes me rage =p), I think I'm going to take the middle row.

    I'm going to store them separately, but then maintain a cache in which they are joined. You wind up with a few pros and cons. Pros of this approach is you (almost) always get accurate data AND you get quick results. The cons are that you have to either a) rebuild the a portion of the cache that had data changed (like when a user changes their signature or b) rebuild the cache on a regular basis.

  7. #7
    SitePoint Wizard
    Join Date
    Dec 2003
    Location
    USA
    Posts
    2,582
    Mentioned
    29 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    oh, and as for sigs being both user and post data, this forum (vbulletin) regards it as user data... i can look up an old post of mine here from years ago, and it has today's sig
    Really?

    It must be a setting somewhere in that mess vB calls their AdminCP. On a forum I manage it stores the signature data as part of the post.

  8. #8
    SitePoint Member
    Join Date
    Feb 2011
    Posts
    7
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Is text data any more performance intensive than ints, due to its independent nature?

    For example if you query a db of millions of rows that are maybe 10kb but have about 50kb of text is it just as fast as a 10kb row with no text, as long as the text is not searched? What if it is returned but but not searched?

    my advice is: don't worry about performance at this point
    Well, I am currently in a refactoring mode. The program involves lots of such data and I'm more of a programmer than a database type so I want to learn how to make a database faster. I know some of the techniques (nested sets) but I am not a database expert.

  9. #9
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,336
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by FreddyG View Post
    Is text data any more performance intensive than ints, due to its independent nature?
    independent nature?

    look, you can't design based on generalities like this

    to give you an idea of how a generality sounds, suppose i said "do not go outside without sunscreen because you could develop melanomas"

    how does this compare to "do not use text data because it is not as efficient as numeric data"

    also, i am sorry, i'm an old guy who never learned modern programming techniques, so i'm afraid i must admit that i have no idea what "refactoring" is

    did you look up "premature optimization is..."?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  10. #10
    SitePoint Member
    Join Date
    Feb 2011
    Posts
    7
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    independent nature?
    According to MYSQL's site: Each BLOB or TEXT value is represented internally by a separately allocated object. This is in contrast to all other data types, for which storage is allocated once per column when the table is opened. In other words, if a TEXT is only about the size of an INT for row size purposes, would it be as fast as querying a row with just an INT? However after searching again I learned that the implementation is specific to a certain type of MYSQL database and may not be relevant to what I am using.

    I asked because of the possibility that row size or database size may be relevant to performance, but I do not know if it is.

    Quote Originally Posted by r937 View Post
    also, i am sorry, i'm an old guy who never learned modern programming techniques, so i'm afraid i must admit that i have no idea what "refactoring" is

    did you look up "premature optimization is..."?
    Refactoring is just a buzz word for recoding. Essentially the board is already working but I am trying to make it more efficient and OOP.

    Somewhat related, I wonder how I may efficiently return a set of sorted rows as a composite row of the most relevant values.

    For example, I want a single set of options from a table:

    options table
    option1 option2 option3

    These are the rows sorted:

    Code:
    A | 0 | 0
    X | B | 0
    X | X | C
    I would return the first non zero or non null that is found of each of the fields:

    option1: A option2: B option3: C


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •