I have a ton of tables on products that have the number of procedures and units stored. What is the best database schema to organize this information in a database?
product 1
year 2007 2008 2009 2010
procedures 100 150 160 200
units 22 30 36 40
product 2
year 2007 2008 2009 2010
procedures 120 250 260 220
units 12 10 16 10
Would this be a good database schema? I’m going to have several thousand data points and I want to be able to add some of them together, group them into categories, apply calculations, etc.
the argument about natural versus surrogate keys aside, it is worth pointing out that the in the original post in this thread, the id columns in both the table_procedures and table_units tables are superfluous and best left out
I can’t even begin to comprehend how the auto generated ids for threads and posts are any different from that of products. Product names and manufacturers can change just as the title of a post could. Surrogate keys automatically become natural when dealing with “post” and “thread” entities? I much prefer surrogate keys for the exact reason you specified – easily changing/updating real world reference data such as the name of a thread, product, or person. Everything about that entity can be changed without affecting the internal system used to link other entities to it. In addition you have a single field to identify it by that will NEVER change. That means bookmarks or references to it will also be the same regardless if the name changes. In the case when your using something like a name as the primary key if that name changes you end up with a broken bookmark or redirect hackery when talking about the affects database decisions have on application programming. Natural keys sound good from a theoretical stand point but in regards to practicality I’ve always found surrogates easier to work with, maintain and understand data. It seems like that is case for most other people I’ve worked with as well because I rarely come across a schema without a surrogate key even though a unique key may exist on a single or set of fields in the table.
Since thread names can change that eliminates your suggestion as a primary key in that instance since primary keys not only have to uniquely identify the rest of the content now but also into the foreseeable future as well.
Of course where there is absolutely no possible natural fields available to use for the primary key then you need to come up with something to serve that purpose. That created field then becomes the natural key for that record since iits use will extend beyond the database since otherwise there is no way of referencing a particular record. A surrogate key is limited to use withiin the database itself as there are already natural ways of referencing the data from outside the database.
Post and thread numbers for a forum are natural keys as that is how they are referred to by the people using the database and not just within the database itself.
if you had the natural keys i suggested (forum names, thread names, etc.), then yes, you would definitely have to have a new primary key for the thread that got moved to a different forum
as for the redirect that’s left in place in the original forum, you can see that the easiest thing would be just to leave the row with the old pk in place, but somehow flag it as a redirect instead of an actual thread, and of course in the target forum you’d do a new insert with the new pk value
this all becomes so much easier with surrogate keys, n’est-ce pas?
Rudy, say a thread which is in forum x gets moved to forum y, would you leave the compound key as it is or would you change for forum name part and what would you use to record in the threads table that the thread has been moved (eg like in vBulletin where in the forum where a thread is moved from it shows still but shows as moved)?
Searching on a key field is usually more efficient than searching on a field that doesn’t have an index as when there is no index on the field then all the records have to be processed in order to find the ones wanted whereas when there is an index (and all keys automatically have an index while other fields only have one if you explicitly create one) allows the lookup to find the matching records without needing to read every single record.
Note that there are instances where reading through all the records can be faster than using an index where having the field indexed can actually slow things down but searching for a year when you expect there to be lots of different years in the data with each year only making up a small percentage is not one of them.
It is if the parts are actually different from one year to the next as then they are different parts that ought to have different part numbers.
If the parts are the same from one year to the next then the year field doesn’t belong in the parts table as the part isn’t year specific.
I think different and not different cover all possibilities there with neither case requiring the year to be a part of the key of the parts table.
Also while I have seen contrived theoretical database designs where a surrogate key would be useful, I have never come across such a situation where one would be useful in any of the real world databases I have worked with over the last thirty or so years. So from a practical viewpoint few database newbies will ever come across a real life situation where surrogate keys are appropriate. So perhaps I should have said that for the 999,999 of the first million newbies reading this they should never need to consider surrogate keys but that there is a remote chance that the statement may not apply to that millionth newbie if they stay in the industry for long enough. I also note that you didn’t provide a counter example of where one would be appropriate.
once again, nice backpedal, stephen, you immediately went from “ALWAYS” to “just about always”
exaggeration (whether slight or egregious) is not the way to influence newbies, not if it leaves them with concepts that are black and white and blinkered instead of practical
also, your suggestion that a compound key which contains a year column should be redesigned so that the year is incorporated into the part number, well, that’s not right either
There is a flaw in the table design if you need that many natural fields in the key. A reorganisation of the tables could get rid of the duplication of data between those tables and most or all of the surrogate keys and leave you with no more than two fields needed in your compound key.
The primary key for TABLE PRODUCTS_PART_NUMBERS will be manufacturer_part_number and whatever field is the real primary key for the PRODUCTS table. If the year really makes a difference then manufacturer_prod_number really needs to include an indication of the year in order to be a useful part number (such as being a different part number for each different year) and then that by itself is a sufficient key for that table and year isn’t needed in either key. manufacturers_id isn’t needed at all since the manufacturer_prod_number is a natural key that serves the same purpose only better since it is actually used for more than just making the records unique.
If you properly normalise your database design at the start you will soon eliminate most of the possible situations where more than two fields are needed in a key and you certainly will in any simple application that only involves a dozen or so tables (unless the application is something extremely unusual to start with). There isn’t anything specific in the normalisation rules that gets rid of the situations where three or more fields are required but I have rarely come across a situation where applying the normalisation rules didn’t end up getting rid of those situations anyway.
once again, you are pushing the credibility envelope with such black and white statements
if i can provide one counter-example, then your “ALWAYS” claim goes down the toilet
hyperbole again
again, this is very misleading to newbies, sometime surrogate keys in fact can speed up processing
there are plenty of instances where the use of a surrogate key is warranted, and stating your opinion in such exaggerated black and white terms makes you look, well… i won’t say it
So why didn’t you post a counter-example so as to prove your point rather than your usual comments which prove nothing.
Sure I exaggerate the situation slightly in some of my posts but the exceptions to what I say are usually so few and far between that with for newbies that are doing the exact opposite it needs to be exaggerated somewhat since what I am suggesting does just about always apply to the sort of situations that they are dealing with and they are unlikely to see any of the exceptions that you keep talking about until they reach an intermediate to advanced level. I am just trying to keep things simple for those who don’t have over 30 years experience of using databases.
It is ALWAYS better to use a natural key rather than adding unnecessary junk to your tables. You can tell the person who knows nothing about proper database design because all their keys are nonsense ids.
Where there is a real field or fields that uniquely identify the record you get a HUGE efficiency increase by making that the primary key.
In this particular case the year is the best candidate for primary key regardless of what other fields are in the table since it will make retrieving ranges of years possible as well as individual years. Once you set the right key then that id field is just so much garbage and can be removed.
The only time you need to invent a key is where there is no combination of fields that is unique or where the only fields that are unique are also volatile and will therefore have different values being inserted into them fairly frequently. Where you do find that inventing a key would be necessary you will often find that there is another real field that you have omitted from the table that contains the values you should be using as the key - for example if you have a parts table you will often find that the part manufacturers already have part numbers to uniquely identify the parts.
So it all comes down to whether you want to do it properly and build a database that works efficiently using the most appropriate keys or clutter your tables with junk so as to slow everything down.
Its really up to you but I actually prefer to use surrogate key rather than a natural one. The reason being ease of reference between multiple table relationships. Purists may look down on it but generally it makes application programming and URL references to entities more concrete. I would keep id as the pk and make a_id and year a unique key. Either way it s debate and it really isn’t going to make or break anything. Most systems I’ve worked on use surrogates rather natural keys to identify rows in the database. Its only in certain cases regarding m:n relationships whether the composite between two foreign key references are used in place of a surrogate.
Here is one question though – isn’t the new “version” of product x actually an entirely new product? It seems to me if I were to release foo product 2008 the foo product 2009 would be entirely separate in theory. In that case every new year warrants an entirely new product to be created?