<snip/> how do you stuff the real people into your database in order that the reap user is there rather than the username field that represents them? <snip/>
The username is the field in the table that represents the user and ALL of the other fields except the id represent facts about the user that the username represents.
The id is either a surrogate for the username (in which case it does not belong in the normalised logical database design and is added as part of the denormalisation process of converting the logical design into a physical implementation OR if the id does belong in the logical design then it implies that username is not unique in which case two tables are required since the rest of the data relates to username and the one username can be attached to multiple ids and then two tables are required.
(1) B is a subset of A (i.e. A->B is "trivial")
(2) A is a superkey
(3) B is a subset of a candidate key.
This could be written very loosely and informally as "every non-key attribute is dependent on every candidate key". The critical and often overlooked words are every candidate key. Like all normal forms, 3NF is concerned with candidate keys and is entirely agnostic about primary keys. It's very unfortunate that there are so many examples around that only show tables with one key and that go on to explain 3NF in terms of dependency on a single key without of making it clear that all the keys are equally important.
Also notice condition (3) in the above definition. Transitive (non-key) dependencies do not violate 3NF if the dependent attriubute(s) are part of a key. This is arguably a "mistake" in 3NF that is rectified by Boyce-Codd Normal Form (BCNF). BCNF is equivalent to the above definition with the last condition removed. Many informal descriptions of 3NF miss this last condition and therefore more accurately they describe BCNF instead of 3NF! This is a mistake, but a "good" one because BCNF is "stronger" than 3NF.
For an accurate understanding of 3NF and other normal forms I can recommend David Maier's "Theory of Relational Databases", Chris Date's "Relational Database Dictionary" or "Foundations of Databases" (AKA "The Alice Book") by Abiteboul et al.
Why does any of this matter?
If you think about it, to define 3NF based on primary keys alone wouldn't even work. If only primary keys mattered then a bijective relationship (A->B, B->A) would be impossible to represent accurately while respecting 3NF because any relation containing both A and B would always violate 3NF. Bijection is a common data modelling scenario and certainly something that needs to be accommodated in any useful database design methodology. If a single primary key was all that mattered then 3NF would become of doubtful value when dealing with relations with more than one candidate key because either you would be forced to ignore some dependencies or you would have to accept that 3NF was automatically violated.
Another point to notice is that if normalization was only concerned with primary keys then 3NF really would be the ultimate normal form for those PK dependencies. That's because non-key dependencies and join dependencies of the kind addressed by BCNF and 5NF would not be possible (because they assume there could be multiple keys). We know that isn't the case and we know that dependencies on all keys are important. That is why 3NF is defined in terms of candidate keys and not primary keys.
What does any of this mean for the specific example in this thread? I'm always hesitant to comment on normalization problems unless the intended set of keys and dependencies is clearly stated. They have not been, so I'll make some assumptions. If ID and UserName are both candidate keys and if the dependencies we want to enforce include ID->UserName, UserName->Password (...etc) and if Password is not a determinant then a table containing only (ID, UserName,Password) is in 3NF (in fact it's in BCNF and 5NF as well ).
That's a lot of assumptions but given those conditions then it seems that the table I just described would be "fully normalized". Other interpretations might be possible - we only have column names to go on after all. It would help if anyone wishing to make points about normalization examples would state what keys and dependencies they are intending to enforce. Otherwise effective discussion on such things is virtually impossible.
Hope this helps.
The way I would approach this is as follows. There are two circumstances where properties are pushed off into a new table
1. Where there is a 1:many relationship between the object and the property
2. Where most objects will not have a subset of properties
So, in this question, the fundamental question is if it is ever possible that a user could have more than one role. If these are mutually exclusive (eg heirachical) then there is no need for a separate table.
Irrespective, there is nothing to stop there being a separate table where the role is enumerated (eg 3 = 'administrator')
The argument about primary key of username or ID - apart from the humour of tit for tat *****ing - I would always go for a key since yes, this means an extra column on this table but it also means that every other table that the user touches does not need a string column for the foreign key. It also means that I can offer the option to change the username without having to visit every other table to change the key.
Wow dportas... that has to be the greatest first post ever. Welcome to the forums.
My rule of thumb is therefore that I don't use an actual data field as a key. That generally translates into every table having a auto-increment Id field as the key to the table.
Another benefit to using a numerical key is that it takes less memory to store. If you are using a string as a key, you are probably using many times more data then you need. Each character takes about a byte, so if you have strings that are longer than 2 or 3 characters it's going to be much more memory than an integer type.
It could add up to a lot. If you're using an unsigned int (4 bytes), you can have 4294967296 rows. If you use UTF-8, you could technically store as many rows for the same memory with four characters... but that's highly impractical. More likely, you'd allow between 4-24 characters, 14 being the average. That means you're using about 10 bytes for every single key use.... that could REALLLLLYYYY add up if you have many rows.
If you use the username as the key, in any other tables which have a relationship to those fields you would have to put the username in them as well.
- posted_by <- username used in this field
If username is your id, then you'd have to use it for every one of your posts... unless r937 knows some magic trick I don't...
if username (string) is the primary key, then yes, it would get used whenever a row in a related table is inserted
if userid (integer) is the primary key, then yes, it would get used whenever a row in a related table is inserted
the difference is, if userid (integer) is the primary key, it gets an index, but you ~also~ have to have a unique index on username
if username (string) is the primary key, then you have only that one index, not two
I'm saying that username takes up more memory to store, since the average length of a username would be longer than 4 bytes, thus it'd take up more memory.
In most smaller cases, it's not that big of a deal. However, if you're dealing with a database that has millions, or more, entries, you could be looking at a lot more storage space needed...
That's all I'm saying. =p
Rémon - Hosting Advisor
Minimal Bookmarks Tree
My Google Chrome extension: browsing bookmarks made easy