SitePoint Sponsor

User Tag List

Results 1 to 14 of 14
  1. #1
    SitePoint Enthusiast
    Join Date
    Dec 2009
    Posts
    39
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Question JOIN performance impact?

    Hi,

    My first post here; the forum came to my attention through the book Simply SQL by Rudy Limeback.

    I have a question about the performance impact of JOIN. Let say you have two tables:

    Code:
    CREATE TABLE users
    (
      userid INT UNSIGNED NOT NULL PRIMARY KEY,
      name VARCHAR(24),
      ...
    )
    
    CREATE TABLE transactions
    (
      transactionid INT UNSIGNED NOT NULL PRIMARY KEY,
      userid INT UNSIGNED,
      ...
    )
    So fairly standard.

    Now most transactions queries will select the name corresponding to the userid, hence require a JOIN.

    The alternative would be to set up the transactions table with the name of the user as an additional column. This would mean duplicate information, but might mean that SELECT statements are faster as that they don't require a JOIN.

    Is the latter statement true? So can there be a situation where you should define the extra column (eg if the users table is large, and there are many transactions queries)?

  2. #2
    SitePοint Troll disgracian's Avatar
    Join Date
    Aug 2006
    Location
    Samsara
    Posts
    451
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    If you are indexing your fields properly, then the performance hit is negligible. If that type of join is a common operation, I would define a view for it.

    Unless you are talking about a hideously large amount of rows (many, many millions in the transactions table and probably 10-100,000 or more users), you're unlikely to hit any kind of problem. Even then on a decent server a single join like that would only take a few seconds at worst.

    As a general rule, when performance is impaired by joins that would be when you have to start denormalising your data. But the amount of data required for that to even happen would justify the use of some sort of data warehouse solution where you run a batch job at intervals (weekly, nightly, whatever) to dump all the data into heavily indexed, denormalised tables and query them.

    Cheers,
    D.

  3. #3
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    8,897
    Mentioned
    138 Post(s)
    Tagged
    2 Thread(s)
    I agree completely with disgracian, although keep in mind that views don't really help for DBMS's without Materialized Views (like MySQL).
    Rémon - Hosting Advisor

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy

  4. #4
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,017
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    what do materialized views have to do with performance? they would slow it down, yes?

    rblon, have you considered not using a numeric userid?
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  5. #5
    SitePoint Enthusiast
    Join Date
    Dec 2009
    Posts
    39
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thx for all your replies. It surprises me that performance hit is negligible. Obviously, I am not that aware of database architecture, but I imagine creating a intermediary table using a JOIN could easily be more expensive than the SELECT statement. Hence, if one extra column could avoid a JOIN statement for all transaction queries that could be worth considering.

    Creating a View would primarily simplify the SQL code, right? And only if there is caching it would impact performance (positive or negative depending on the application)?

    I haven't considered using non-numeric userids. One reason is that names don't have to be unique, and the other that I imagine numeric keys being more efficient for comparisons.

  6. #6
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,017
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by rblon View Post
    ... I imagine creating a intermediary table using a JOIN could easily be more expensive than the SELECT statement.
    in general, don't try to outthink the database optimizer

    joins don't necessarily require an intermediate table

    Quote Originally Posted by rblon View Post
    Hence, if one extra column could avoid a JOIN statement for all transaction queries that could be worth considering.
    do a search for "premature optimization"



    Quote Originally Posted by rblon View Post
    I haven't considered using non-numeric userids. One reason is that names don't have to be unique, and the other that I imagine numeric keys being more efficient for comparisons.
    the difference in comparing numeric keys is negligible

    you seriously allow multiple users to have the same username?

    how do they log in, and how do you tell them apart?
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  7. #7
    SitePoint Enthusiast
    Join Date
    Dec 2009
    Posts
    39
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    thx you're all pretty clear I shouldn't try these kind of "optimizations".

    I like to use email address + password for logging in. So email is another column in the users table. And, the email address has to be unique, the name doesn't have to be.

  8. #8
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,017
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    okay, logging in with unique email address makes sense

    but back to your original question...

    if you use a join to pull out the username for a transaction, and that username isn't unique, then what good is displaying it? i mean, why bother? you don't know which user it is!!!
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  9. #9
    SitePoint Enthusiast
    Join Date
    Dec 2009
    Posts
    39
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The term username tends to be confusing, that's why I didn't use it.

    When displaying a transaction I want to show the name of the user. Yes there might be several users with that name, which theoretically could be confusing, but in practice is usually not an issue.

    And, the alternative to "reject" someones name, because it is already in use, is often not appreciated (imagine you cannot open a bank account under your own name, because your uncle with the same name has already an account).

  10. #10
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,017
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    yes, i understand the customer appreciation issue

    i just thought--to bring this back round to the original topic--that joining to get the user name was rather pointless

    if you're showing a customer one of his own transactions, then what's the point? the customer should know who he is

    and if you're showing someone else the transaction, then what's the point? you don't really know which user it is, when several can have the same name

    maybe your question about joins should have used a different example?

    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  11. #11
    SitePoint Enthusiast
    Join Date
    Dec 2009
    Posts
    39
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    i just thought--to bring this back round to the original topic--that joining to get the user name was rather pointless
    I really don't see that. Obviously it is a simplified example, but it appears to me it is a common situation: each transaction is performed by a unique user, while (almost) every time you lookup transaction details you also want to know his non-unique name.

    Anyway, the orginial topic is if there are situations where you would add a column to a table (with duplicate information) to avoid performance impact of a JOIN with every SELECT. It seems the consensus is that you shouldn't go that route.

  12. #12
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,017
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    yes, that's right -- as a general rule, don't compromise your table design, optimize your queries instead (so that the joins are efficient) by ensuring that the appropriate indexes exist
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  13. #13
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    8,897
    Mentioned
    138 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by r937 View Post
    what do materialized views have to do with performance? they would slow it down, yes?
    A materialized view is faster than a non-materialized view, since a materialized view is really a table (so no joins needed) while a non-materialized view just performs the query it was made for (joins needed).
    Rémon - Hosting Advisor

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy

  14. #14
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    8,897
    Mentioned
    138 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by rblon View Post
    Anyway, the orginial topic is if there are situations where you would add a column to a table (with duplicate information) to avoid performance impact of a JOIN with every SELECT. It seems the consensus is that you shouldn't go that route.
    Doing that the table also wouldn't be in 1NF, which is generally a bad idea.
    Rémon - Hosting Advisor

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •