SitePoint Sponsor

User Tag List

Page 2 of 2 FirstFirst 12
Results 26 to 42 of 42
  1. #26
    SitePoint Addict D3V4's Avatar
    Join Date
    May 2010
    Posts
    370
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Timme View Post
    The normalisation rules do have some leeway for interpretation.
    What do you mean? o_O If you want to normalize a table you must follow very precise rules.

  2. #27
    Programming Since 1978 silver trophybronze trophy felgall's Avatar
    Join Date
    Sep 2005
    Location
    Sydney, NSW, Australia
    Posts
    16,595
    Mentioned
    24 Post(s)
    Tagged
    1 Thread(s)
    <snip/> how do you stuff the real people into your database in order that the reap user is there rather than the username field that represents them? <snip/>

    The username is the field in the table that represents the user and ALL of the other fields except the id represent facts about the user that the username represents.

    The id is either a surrogate for the username (in which case it does not belong in the normalised logical database design and is added as part of the denormalisation process of converting the logical design into a physical implementation OR if the id does belong in the logical design then it implies that username is not unique in which case two tables are required since the rest of the data relates to username and the one username can be attached to multiple ids and then two tables are required.

    <snip/>
    Last edited by Mittineague; Oct 20, 2010 at 11:38. Reason: off-topic removed
    Stephen J Chapman

    javascriptexample.net, Book Reviews, follow me on Twitter
    HTML Help, CSS Help, JavaScript Help, PHP/mySQL Help, blog
    <input name="html5" type="text" required pattern="^$">

  3. #28
    SitePoint Member
    Join Date
    Oct 2010
    Posts
    3
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by felgall View Post
    To get a table into third normal form you must "Remove non-key columns representing facts about other non-key columns into a separate table"
    Unfortunately Third Normal Form is often misquoted in this way and that can lead to a lot of confusion. The more precise description of 3NF is that a relation R satisfies 3NF if and only if, for every functional dependency A->B satisfied by R, ONE of the following three conditions is true. Either:

    (1) B is a subset of A (i.e. A->B is "trivial")
    (2) A is a superkey
    (3) B is a subset of a candidate key.

    This could be written very loosely and informally as "every non-key attribute is dependent on every candidate key". The critical and often overlooked words are every candidate key. Like all normal forms, 3NF is concerned with candidate keys and is entirely agnostic about primary keys. It's very unfortunate that there are so many examples around that only show tables with one key and that go on to explain 3NF in terms of dependency on a single key without of making it clear that all the keys are equally important.

    Also notice condition (3) in the above definition. Transitive (non-key) dependencies do not violate 3NF if the dependent attriubute(s) are part of a key. This is arguably a "mistake" in 3NF that is rectified by Boyce-Codd Normal Form (BCNF). BCNF is equivalent to the above definition with the last condition removed. Many informal descriptions of 3NF miss this last condition and therefore more accurately they describe BCNF instead of 3NF! This is a mistake, but a "good" one because BCNF is "stronger" than 3NF.

    For an accurate understanding of 3NF and other normal forms I can recommend David Maier's "Theory of Relational Databases", Chris Date's "Relational Database Dictionary" or "Foundations of Databases" (AKA "The Alice Book") by Abiteboul et al.

    Why does any of this matter?

    If you think about it, to define 3NF based on primary keys alone wouldn't even work. If only primary keys mattered then a bijective relationship (A->B, B->A) would be impossible to represent accurately while respecting 3NF because any relation containing both A and B would always violate 3NF. Bijection is a common data modelling scenario and certainly something that needs to be accommodated in any useful database design methodology. If a single primary key was all that mattered then 3NF would become of doubtful value when dealing with relations with more than one candidate key because either you would be forced to ignore some dependencies or you would have to accept that 3NF was automatically violated.

    Another point to notice is that if normalization was only concerned with primary keys then 3NF really would be the ultimate normal form for those PK dependencies. That's because non-key dependencies and join dependencies of the kind addressed by BCNF and 5NF would not be possible (because they assume there could be multiple keys). We know that isn't the case and we know that dependencies on all keys are important. That is why 3NF is defined in terms of candidate keys and not primary keys.

    What does any of this mean for the specific example in this thread? I'm always hesitant to comment on normalization problems unless the intended set of keys and dependencies is clearly stated. They have not been, so I'll make some assumptions. If ID and UserName are both candidate keys and if the dependencies we want to enforce include ID->UserName, UserName->Password (...etc) and if Password is not a determinant then a table containing only (ID, UserName,Password) is in 3NF (in fact it's in BCNF and 5NF as well ).

    That's a lot of assumptions but given those conditions then it seems that the table I just described would be "fully normalized". Other interpretations might be possible - we only have column names to go on after all. It would help if anyone wishing to make points about normalization examples would state what keys and dependencies they are intending to enforce. Otherwise effective discussion on such things is virtually impossible.

    Hope this helps.

    David

  4. #29
    SitePoint Enthusiast
    Join Date
    Jun 2008
    Location
    Mansfield, UK
    Posts
    46
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The way I would approach this is as follows. There are two circumstances where properties are pushed off into a new table
    1. Where there is a 1:many relationship between the object and the property
    2. Where most objects will not have a subset of properties

    So, in this question, the fundamental question is if it is ever possible that a user could have more than one role. If these are mutually exclusive (eg heirachical) then there is no need for a separate table.

    Irrespective, there is nothing to stop there being a separate table where the role is enumerated (eg 3 = 'administrator')

    The argument about primary key of username or ID - apart from the humour of tit for tat *****ing - I would always go for a key since yes, this means an extra column on this table but it also means that every other table that the user touches does not need a string column for the foreign key. It also means that I can offer the option to change the username without having to visit every other table to change the key.

  5. #30
    SitePoint Wizard
    Join Date
    Dec 2003
    Location
    USA
    Posts
    2,582
    Mentioned
    29 Post(s)
    Tagged
    0 Thread(s)
    Wow dportas... that has to be the greatest first post ever. Welcome to the forums.

  6. #31
    SitePoint Member
    Join Date
    Oct 2010
    Posts
    1
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by W2ttsy View Post
    I'd be more inclined to have a numerical ID in the event changing a username is an option (now or in the future). It removes the PK dependency on a changeable item then.
    In my experience (which is admittedly not as extensive as the gurus above) I never use a field that I will display to a user as a key field. I have been caught out enough times believing that a field would always be unique then later on finding that I it isn't.

    My rule of thumb is therefore that I don't use an actual data field as a key. That generally translates into every table having a auto-increment Id field as the key to the table.

  7. #32
    SitePoint Wizard
    Join Date
    Dec 2003
    Location
    USA
    Posts
    2,582
    Mentioned
    29 Post(s)
    Tagged
    0 Thread(s)
    Another benefit to using a numerical key is that it takes less memory to store. If you are using a string as a key, you are probably using many times more data then you need. Each character takes about a byte, so if you have strings that are longer than 2 or 3 characters it's going to be much more memory than an integer type.

    It could add up to a lot. If you're using an unsigned int (4 bytes), you can have 4294967296 rows. If you use UTF-8, you could technically store as many rows for the same memory with four characters... but that's highly impractical. More likely, you'd allow between 4-24 characters, 14 being the average. That means you're using about 10 bytes for every single key use.... that could REALLLLLYYYY add up if you have many rows.

  8. #33
    SitePoint Addict D3V4's Avatar
    Join Date
    May 2010
    Posts
    370
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by samanime View Post
    Another benefit to using a numerical key is that it takes less memory to store. If you are using a string as a key, you are probably using many times more data then you need. Each character takes about a byte, so if you have strings that are longer than 2 or 3 characters it's going to be much more memory than an integer type.

    It could add up to a lot. If you're using an unsigned int (4 bytes), you can have 4294967296 rows. If you use UTF-8, you could technically store as many rows for the same memory with four characters... but that's highly impractical. More likely, you'd allow between 4-24 characters, 14 being the average. That means you're using about 10 bytes for every single key use.... that could REALLLLLYYYY add up if you have many rows.
    Wait a second, but if I use the username field as the key instead of using an integer id, I'm not using any extra space because the username must be there anyway. Or is there something that I'm missing here?

  9. #34
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,014
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by D3V4 View Post
    Or is there something that I'm missing here?
    nope
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  10. #35
    SitePoint Wizard
    Join Date
    Dec 2003
    Location
    USA
    Posts
    2,582
    Mentioned
    29 Post(s)
    Tagged
    0 Thread(s)
    If you use the username as the key, in any other tables which have a relationship to those fields you would have to put the username in them as well.

    Example:
    users table
    -----------------------
    - username
    - display_name
    - password

    user_posts
    --------------------
    - posted_by <- username used in this field
    - post
    - date

    If username is your id, then you'd have to use it for every one of your posts... unless r937 knows some magic trick I don't...

  11. #36
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,014
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by samanime View Post
    If username is your id, then you'd have to use it for every one of your posts... unless r937 knows some magic trick I don't...
    i don't understand the point you're making

    if username (string) is the primary key, then yes, it would get used whenever a row in a related table is inserted

    if userid (integer) is the primary key, then yes, it would get used whenever a row in a related table is inserted

    the difference is, if userid (integer) is the primary key, it gets an index, but you ~also~ have to have a unique index on username

    if username (string) is the primary key, then you have only that one index, not two
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  12. #37
    SitePoint Wizard
    Join Date
    Dec 2003
    Location
    USA
    Posts
    2,582
    Mentioned
    29 Post(s)
    Tagged
    0 Thread(s)
    I'm saying that username takes up more memory to store, since the average length of a username would be longer than 4 bytes, thus it'd take up more memory.

    In most smaller cases, it's not that big of a deal. However, if you're dealing with a database that has millions, or more, entries, you could be looking at a lot more storage space needed...

    That's all I'm saying. =p

  13. #38
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,014
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by samanime View Post
    I'm saying that username takes up more memory to store
    so you're suggesting using an integer ~instead~ of a username?

    whoa, that's a bit harsh, innit?
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  14. #39
    SitePoint Addict D3V4's Avatar
    Join Date
    May 2010
    Posts
    370
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    nope
    Ok, I'm happy about that ;D

  15. #40
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    8,891
    Mentioned
    138 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by r937 View Post
    so you're suggesting using an integer ~instead~ of a username?

    whoa, that's a bit harsh, innit?
    I think he's suggesting that storing the username for the FKs takes a lot more disk space than storing INTs for FKs.
    Which is true of course.
    Rémon - Hosting Advisor

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy

  16. #41
    SitePoint Addict D3V4's Avatar
    Join Date
    May 2010
    Posts
    370
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by ScallioXTX View Post
    I think he's suggesting that storing the username for the FKs takes a lot more disk space than storing INTs for FKs.
    Which is true of course.
    That's true, but since the username has to be in the database too, it's pointless to add another field (even if it's just an integer) as the primary key, because we can use directly the username for this purpose.

  17. #42
    SitePoint Zealot
    Join Date
    Apr 2009
    Location
    Wenatchee, WA
    Posts
    101
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by D3V4 View Post
    That's true, but since the username has to be in the database too, it's pointless to add another field (even if it's just an integer) as the primary key, because we can use directly the username for this purpose.
    I thought that was already deemed a questionable idea... unless you a) don't want to allow your users to change their username (what if they get married/divorced?) or b) don't mind them changing their primary key with potential cascading effects throughout your tables. Is it really *that* big of a rub to just use a surrogate primary key id number *and* a username?


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •