Database Schema for products over multiple years

okay, here’s one for you

let’s suppose you want a forum database, which will have forums, threads, and posts

each forum can have multiple threads, and each thread can have multiple posts

okay, starting with the forum, the most obvious natural key would be the forum name, and since we don’t want two forums with exactly the same name, that makes an excellent primary key, albeit somewhat long – let’s say VARCHAR(255) to cover the longest name

now we move to the threads – each thread belongs to one and only one forum, but here the thread name is not sufficiently unique to be the primary key, since it is quite conceivable for two different forums to have threads with the same name (e.g. the same “rules for this forum” name on a sticky thread in each forum)

so therefore the primary key of the threads table has to be a compound key consisting of the forum name and the thread name

now we move to the posts table, and i’m sure even you can see what’s coming next – further additional column(s) to go along with the forum name and thread name, additional columns which would make each post unique

now all i want to do is bring up the scenario of the user interface which allows someone to bookmark a particular post

what is the url for the post? well, obvioulsy, it’s not going to look anything like vBulletin’s urls, where you will see http://example.com/forums/showpost.php?p=4630669 or http://example.com/forums/showthread.php?t=685470

in fact, the only way to reference a particular row in any of the forums, threads, or posts tables, is with the rather unfortunately looooooooooooong urls consisting in some cases of multiple VARCHAR(255) values concatenated together

when we end up with unweildy primary keys coming out of our normalized, natural key database design like this, surrogate keys are a blessing

and in fact this argument holds for countless examples in real life, where surrogate keys are used

once again, i am going to call you out on something you said –

Also while I have seen contrived theoretical database designs where a surrogate key would be useful, I have never come across such a situation where one would be useful in any of the real world databases I have worked with over the last thirty or so years. (emphasis added)
either you have once again larded your comment with egregious hyperbole, or you have truly never worked with any real database of any real import

Ok thanks for everyone’s input, especially Rudy and Stephen, your input is always excellent. I am revisiting this again now and I found a sample of data that represents what I am trying to store.

Here is a sample of four tables which represent a small segment of data I will be storing in the database. There will be about 10 countries, with about 100 tables of data for each country, each table will contain different row headings, etc.

I am looking for the most flexible database schema to store in MySQL. I plan on displaying it on the web using PHP in table form and also use jQuery to create a chart from this data.

Looking at the sample data and taking in the comments is this correct:

=======================
table_product
=======================
p_id	name
-----------------------
1	Product #1
2	Product #2

=======================
table_countries
=======================
c_id	name
-----------------------
1	India
2	Russia

=======================
table_country_association
=======================
c_id	p_id
-----------------------
1	1
1	2

===================================
table_procedures
===================================
p_id	year	data
-----------------------------------
1	2007	100
1	2008	150
1	2009	160
1	2010	200
2	2007	120
2	2008	250
2	2009	260
2	2010	220


===================================
table_units
===================================
id	p_id	year	data
-----------------------------------
1	2007	22
1	2008	30
1	2009	36
1	2010	40
2	2007	12
2	2008	10
2	2009	16
2	2010	10

There will be about 10 countries, with about 100 tables of data for each country
right there, that’s probably a bad idea – tables should be organized and consolidated by content, and not needlessly placed into vertical silos (to use the buzzwords)

also, that image you posted, that data needs a lot of work before you can store it in tables – the columns do not hold homogenous information!

example: row 3 has large counts, row 4 is percentages, row 5 is a count of something different…

even though you can do this in excel, with nice automatic formulae that do calculations and so on, you can’t do that in a database

no, what i meant was, a table for cars, and a table for radishes, and a table for emotions

you do not want to have a table for domestic cars, and a separate table for used cars, and a separate table for sports cars… which would be vertical partitioning

see the difference?

(quietly asking these questions in the hope you don’t throw your keyboard at me)

So the data should be stored by content as in, a table for percentages, a table for decimals, and a table for whole numbers?

What would be the problem with organizing by procedure/country and storing the numbers “as is” for display purposes?

Ok yes, I was a little confused and thought you were suggesting that.

I have tables organized by country, procedure and then the actual data points. Do you see any issue with this, or can you suggest a rough schema you would use?

depending on your information retrieval requirements, the SQL could get very hairy

that would require that i am familiar with the kind of data you are actually storing

Ok well taking a look at the sample data. It has four tables and represents one country, say Canada :). There will be six other countries with the same structure as the example just different numbers.

i guess i didn’t make my point clearly enough

you can’t put population, surgeries, and percentages in the same column

you’d need a table for population (by year), and a table for surgeries (by type, by year), and you wouldn’t store the percentages

Don’t you mean “shouldnt” put population, surgeries and % in the same column? Couldn’t you store them as VARCHAR ‘as is’?

With your suggested separation of population, surgeries and % – what would the tables look like?

You bring up a good point. I may only need to store the 2009 columns and the CAGR (percentage) columns as the rows are built on that percentage and foretasted backwards and forwards in time. This would apply to every row that has a CAGR % in the last column. We take the 2009 figure from each row and apply the CAGR % to it.

i think you’re missing the point about why you might want to use a database

why can’t you just use excel sheets for this?

-we want to present it online
-by storing the data and re-presenting it online, we can build cool looking charts to visualize the data
-by presenting the figures online, we can allow people to edit some of the core data with their own figures, and recalculate the figures using our own private equations and display the data for them

This thread was started because I’ve never tried to store tabular data. I’m sure I could just store each table “as is” seen in excel. This is probably the easiest. Also, I’m sure I can break down each cell and store it individually into its own table. Its this part I am not familiar with and am still wondering how you would store tabular data in excel or what the best way would be 31 posts later ;).

I suppose I am looking for someone to say:

Hey ripcurlksm, the most dynamic way to store tabular data is like this: 

"country" table with name and country_id
"procedure" table with name and procedure_id
"cell-data" table with data, year, country_id and procedure_id

----------------
country
----------------
1 Canada
2 USA

----------------
procedure
----------------
1 Liposuction
2 Breast Augmentation

----------------
cell-data
----------------
42	2007	1	1
71	2008	1	1
117	2010	1	1
192	2011	1	1