Go Back   SitePoint Forums > Forum Index > Program Your Site > Databases
Newsletter FAQ Members List Calendar Mark Forums Read

New to SitePoint Forums? Register here for free!

SitePoint Sponsor
 
Reply
 
Thread Tools Display Modes
Old Feb 23, 2004, 15:28   #1
MattR
Database Jedi
 
MattR's Avatar
 
Join Date: Jan 2001
Location: buried in the database shell (Washington, DC)
Posts: 1,108
Thumbs up Is SQL Relational?

Sorry, that was a poor hook to get people to read the thread. The reason why I thought I'd post this is that people have been asking me about the DBDebunk.com people and their thoughts and my thoughts on current products etc. This is an attempt to help put it down in words which people can understand.

(If you must know: because SQL allows you to create a table *without* a primary key (which then means that result sets can have duplicate rows) then it is not relational. End of story.)

I'm not saying that SQL is double-plus ungood, just pointing out that it is not relational (just as saying that 2+2 != 5, and the sky is not made of fish), and don't attribute deficiencies of SQL to deficiencies of the relational model.

You can begin to understand how Date and Pascal et al at DBDebunk.com feel if you consider the following scenario (this thought exercise presupposes that perfect is possible):
You spend a lot of time and effort developing The Perfect Car which is perfect in every way. Not only does it not require any non-renewable resources, but it drives to any destination perfectly and is perfectly safe. You work out all the mathematical details and proofs and can say: "I have proven that this car is perfect."

Since you do not have the time/expertise/money/etc. to build The Perfect Car you then license your Perfect Car Model to the big automakers. They then proceed to implement your Perfect Model in the form of a "Perfect Car" Implementation. Unfortunately for them, building The Perfect Car is very, very difficult, almost impossible. The automakers then proceed to make significant changes to your Perfect Model. They cut corners, make changes which violate certain precepts and assumptions in your Perfect Model, etc.

They then put The Less-Than-Perfect Car on the market but proceed to call it a Perfect Car. After the "Perfect Car" Implementations that people start to buy get lost, run out of gas, and even blow up and kill them, they start saying: "These Things are wrong with the Perfect Car!"

Enterprising people then decide to try and fix the "Perfect Car" Implementations by creating New Perfect Car Models. Some of these models include the implementations as a background. Some create Entirely New Models Without Significant Scientific Background. They provide, possibly, incremental improvement over the "Perfect Car" Implementations but generally include just as many, if not more so, opportunities for flaming, burning death as the current Implementations. Not only that, but they throw out Actual Working Parts of the "Perfect Car" Implementations!!

And all the while you are there, yelling from the sidelines: "But that is not a Perfect Car! I have shown you the path (Model) to building the Perfect Car! I have Proved it True! If you'd stop wasting your time on these other Stupid Designs and focus on the Perfect Model then we'd all be better off!"
Now that this long-winded description is over you can replace The Perfect Car with The Relational Model and "Perfect Car" Implementations with {Oracle, MySQL, etc.}. You can replace "New Perfect Car Models" (including "Without Significant Scientific Background") with {XML, OO-DBMS, 'Persistence Layers', etc.}.

No one is saying that you cannot use SQL products or XML, or that you cannot accomplish tasks in these tools, just that when used in the context of data management they are poorly solving what the Relational Model already solved.

Because IT practitioners are poorly educated and increasingly fad-driven they latch onto non-solutions (like XML, "Post-Relational", OO-DBMS, etc.) and put little or no pressure on DBMS vendors to get it right. Even worse, if someone does release a Truly Relational DBMS there are no guarantees that anyone will buy it due to the ignorance of the IT community.

Put simply: People don't know what they're missing, so they don't know to ask.
__________________
Matt - Sybase DBA / PHP fanatic
Sybase/MySQL/Oracle | I don't like MySQL
Download Sybase | DBForums.com - for all your RDBMS talk
MattR is offline   Reply With Quote
Old Feb 23, 2004, 15:36   #2
MattR
Database Jedi
 
MattR's Avatar
 
Join Date: Jan 2001
Location: buried in the database shell (Washington, DC)
Posts: 1,108
Dijkstra has a few things to say on the topic as well:

On Education, Specifically:


The ongoing process of becoming more and more an a-mathematical society is more an American specialty than anything else (It is also a tragic accident of history).

The idea of a formal design discipline is often rejected on account of vague cultural/philosophical condemnations such as "stifling creativity"; this is more pronounced in the Anglo-Saxon world where a romantic vision of "the humanities" in fact idealizes technical incompetence. Another aspect of that same trait is the cult of iterative design.

Industry suffers from the managerial dogma that for the sake of stability and continuity, the company should be independent of the competence of individual employees. Hence industry rejects any methodological proposal that can be viewed as making intellectual demands on its work force. Since in the US the influence of industry is more pervasive than elsewhere, the above dogma hurts American computing science most. The moral of this sad part of the story is that as long as the computing science is not allowed to save the computer industry, we had better see to it that the computer industry does not kill computing science.
And then on Computer Science in general:

"I hope very much that computing science at large will become more mature, as I am annoyed by two phenomena that both strike me as symptoms of immaturity.

The one is the widespread sensitivity to fads and fashions, and the wholesale adoption of buzzwords and even buzz notes. Write a paper promising salvation, make it a "structured" something or a "virtual" something, or "abstract", "distributed" or "higher-order" or "applicative" and you can almost be certain of having started a new cult.

The other one is the sensitivity to the market place, the unchallenged assumption that industrial products, just because they are there, become by their mere existence a topic worthy of scientific attention, no matter how grave the mistakes they embody. In the sixties the battle that was needed to prevent computing science from degenerating to "how to live with the 360" has been won, and "courses" -- usually "in depth"!-- about MVS or what have you are now confined to the not so respectable subculture of the commercial training circuit. But now we hear that the advent of the microprocessors is going to revolutionize computing science! I don't believe that, unless the chasing of dayflies is confused with doing research. A similar battle may be needed."
--Edsger W. Dijkstra, My Hopes Of Computing Science, 1979
__________________
Matt - Sybase DBA / PHP fanatic
Sybase/MySQL/Oracle | I don't like MySQL
Download Sybase | DBForums.com - for all your RDBMS talk
MattR is offline   Reply With Quote
Old Feb 23, 2004, 15:52   #3
mmj
Test cases complete. 0 fails.
 
mmj's Avatar
 
Join Date: Feb 2001
Location: Melbourne Australia
Posts: 6,568
Forgive my ignorance, but I have a question about the most frequently mentioned reason why current SQL implementations do not follow the relational model. That is that current implementations allow 'tables' without primary keys, thereby breaking the assumption that all entries can be referred to uniquely.

Could it not be said that this pariticular feature is an extension of the relational model allowing us to use the database in a different way, and that if we simply restricted ourselves to only creating tables with primary keys, this particular problem no longer is an issue?
__________________
[mmj] My momentous journey
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The Bit Depth Blog · Twitter · Contact me
Spuds Jokes Bazaar VCS Inkscape Firefox phpBB
mmj is online now   Reply With Quote
Old Feb 23, 2004, 16:05   #4
MattR
Database Jedi
 
MattR's Avatar
 
Join Date: Jan 2001
Location: buried in the database shell (Washington, DC)
Posts: 1,108
I do not assume to be the authority on the relational model.

That said, to the best of my knowledge:
If SQL required primary keys (even if it is a surrogate) AND always applied distinct in SELECT statements then it would pass the uniqueness test.

Of course, there are other Relational tests it fails, so those would necessarily have to be tackled as well.
__________________
Matt - Sybase DBA / PHP fanatic
Sybase/MySQL/Oracle | I don't like MySQL
Download Sybase | DBForums.com - for all your RDBMS talk
MattR is offline   Reply With Quote
Old Feb 23, 2004, 18:10   #5
wwb_99
SitePoint Author
silver trophybronze trophy
 
wwb_99's Avatar
 
Join Date: May 2003
Location: Washington, DC
Posts: 9,421
Not trying to rain on the parade, but why does it matter? I would much rather see better branching logic in SQL than some kind of truly relational theory enforcement.

Kind of like how the 4th normal form and beyond are neat in a classroom but flippin useless in most real world applications . . .

WWB
wwb_99 is offline   Reply With Quote
Old Feb 23, 2004, 18:43   #6
MattR
Database Jedi
 
MattR's Avatar
 
Join Date: Jan 2001
Location: buried in the database shell (Washington, DC)
Posts: 1,108
Quote:
Originally Posted by wwb_99
Not trying to rain on the parade, but why does it matter? I would much rather see better branching logic in SQL than some kind of truly relational theory enforcement.
What is "branching logic"? Do you even know what relational theory enforcement means? Do you know what theory means?

Quote:
Originally Posted by wwb_99
Kind of like how the 4th normal form and beyond are neat in a classroom but flippin useless in most real world applications . . .

WWB
Do you even know what the normal forms are and the purposes of normalization?
__________________
Matt - Sybase DBA / PHP fanatic
Sybase/MySQL/Oracle | I don't like MySQL
Download Sybase | DBForums.com - for all your RDBMS talk
MattR is offline   Reply With Quote
Old Feb 23, 2004, 20:05   #7
r937
SQL Consultant
silver trophybronze trophy
SitePoint Award Recipient
 
r937's Avatar
 
Join Date: Jul 2002
Location: Toronto, Canada
Posts: 32,567
branching logic is if/else, do while, for each...

a dead giveaway of the wrong way to approach a relational language

as for myself, there are few things i cannot do in sql, but there a some things i would not do in sql, and branching is one of them
__________________
r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
"giving out my real stuffs"
r937 is offline   Reply With Quote
Old Feb 23, 2004, 21:44   #8
mmj
Test cases complete. 0 fails.
 
mmj's Avatar
 
Join Date: Feb 2001
Location: Melbourne Australia
Posts: 6,568
Quote:
Of course, there are other Relational tests it fails, so those would necessarily have to be tackled as well.
You make a vague reference to 'other Relational tests'. Well, would you care to mention them? Why haven't you given any specific examples?

Quote:
Originally Posted by MattR
Do you even know what relational theory enforcement means? Do you know what theory means?

Do you even know what the normal forms are and the purposes of normalization?
Why not just tell us, rather than being so vague and mysterious about it all?
__________________
[mmj] My momentous journey
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The Bit Depth Blog · Twitter · Contact me
Spuds Jokes Bazaar VCS Inkscape Firefox phpBB
mmj is online now   Reply With Quote
Old Feb 24, 2004, 02:27   #9
DaveMaxwell
Just Blow It!
bronze trophy
 
DaveMaxwell's Avatar
 
Join Date: Nov 1999
Location: Mechanicsburg, PA
Posts: 5,145
Quote:
Originally Posted by MattR
Do you even know what the normal forms are and the purposes of normalization?
I do, and anything farther than third normal (ok, fourth in SOME small cases) is overkill and unecessary and most often causes a greater burden on the parsers.

"nth-normal (anything past 3rd) is nothing more than an intellectual exercise that has no real use in the business world" - a semi-quote(it has been 10+ years) from one of my professors in college, who knew what he was talking about.
__________________
Dave Maxwell - Design Team Leader
My new favorite YouTube Video!
RIP Dan Schulz
New Motto: Mounds of compliments can't replace one
insult. Choose your words carefully.


DaveMaxwell is offline   Reply With Quote
Old Feb 24, 2004, 05:56   #10
MattR
Database Jedi
 
MattR's Avatar
 
Join Date: Jan 2001
Location: buried in the database shell (Washington, DC)
Posts: 1,108
wwb_99:
The negative connotations most people associate with the word theory are words like: unpractical. speculation. conjecture. But, in the context of Relational Theory it means that it has a strong, proven, scientific (and mathematical) background. Who wouldn’t want their DBMS to be based on a strong, proven set of scientific theories? What else would you have? A DBMS based on hunches, or gut-feelings, or untested assumptions? You must have data that you don’t mind losing. For the rest of us, we’ll stick with what is proven (in the formal sense) to work.

mmj, it is not my job to teach you the relational model. Last time I checked the pay for teaching someone over a message board is way below my acceptable salary range. I can only try and light the flame of education – it’s your decision whether or not to remain in the darkness. Also, remember Date’s incoherence principle: it is difficult to respond coherently to that which is incoherent.

Aside: Plus, most people are confused about things like the normal forms and usually once people put down their thoughts onto paper (so to speak) they end up going “a hah!” and arriving at the correct conclusion themselves. It’s sort of like the “Rubber Duckie” form of software debugging – whenever you have a stumbling block in development pull out a little rubber duckie from your desk and then try and explain to it your problem given that it knows nothing about it. Oftentimes simply explaining the problem will lead to “A hah!” moments of clarity.
What other deficiencies do SQL-products contain?
  • No support for relational domains
  • Non updatable views (partially due to duplicate handling)
  • Logical integrity constraints (such as uniqueness) based on physical implementation details (e.g. indexes). This violates the physical independence rule.
  • No support for entity sub/supertype relationships
  • NULL support
If you want to know a good set of ‘rules of thumb’, search Google for Codd’s 12 Rules. Also, let me know what you think about rule 3.

Finally, would someone please tell me what normal form the following relation is in:

Employee( Number, Name, Department, HireDate, Salary )

I’ll give you a hint: it’s greater than 3NF. It’s even greater than 4NF. Yes, it’s a fully normalized table – that means it covers 1-5NF!! So, without even knowing it, I would bet that most, if not all, of the tables you create a fully normalized.
Aside: The normal forms are often called “formalizations of common sense”. This is pretty much true; no one is suggesting you design your tables in 0NF and then systematically apply the rules. However, there is great advantage to formalizing a process: we can automate it. There’s nothing stopping DBMS developers from creating a ‘Normalization Checker’ or some other system to automagically normalize your relations.
I don’t want to say it, but I would bet that your professor was incorrectly interpreting the 4 and 5NFs which is why he said that they are ‘academic’. Most people have an incorrect assumption that 4 and 5NF’s goals are to turn relations like the above into little tiny “binary” relations such as Employee_name( Eno, Name ), Employee_Dept( Eno, Dept ), etc. This is not the case. Remember: normalization is about removing redundancies and update anomalies. If your relation (table) contains no redundancies and update anomalies then guess what – you’re fully normalized!
__________________
Matt - Sybase DBA / PHP fanatic
Sybase/MySQL/Oracle | I don't like MySQL
Download Sybase | DBForums.com - for all your RDBMS talk
MattR is offline   Reply With Quote
Old Feb 24, 2004, 06:54   #11
r937
SQL Consultant
silver trophybronze trophy
SitePoint Award Recipient
 
r937's Avatar
 
Join Date: Jul 2002
Location: Toronto, Canada
Posts: 32,567
Quote:
Originally Posted by MattR
If you want to know a good set of ‘rules of thumb’, search Google for Codd’s 12 Rules. Also, let me know what you think about rule 3
i found several versions:
Quote:
Rule 3: Systematic Treatment of Null Values
A field should be allowed to remain empty. This involves the support of a null value which is distinct from an empty string or a number with a value of zero. Of course, this can't apply to primary keys. Also, most database implementations support the concept of a nun-null field constraint that prevents null values in a specific table column.
Quote:
Rule 3: Systematic Treatment of Null Values
Null values (distinct from empty character string or a string of blank characters and distinct from zero or any other number) are supported in the fully relational DBMS for representing missing information in a systematic way, independent of data type.
Quote:
3: Systematic Treatment of Null Values
The DBMS is required to support a representation of "missing information and inapplicable information" that is systematic, distinct from all regular values (for example, "distinct from zero or any other number," in the case of numeric values), and independent of data type. It is also implied that such representations must be manipulated by the DBMS in a systematic way.
and what i think of this rule? i think it's wonderful!!

i cannot imagine what i would do to represent "not applicable" in a column if i could not use NULL
__________________
r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
"giving out my real stuffs"
r937 is offline   Reply With Quote
Old Feb 24, 2004, 07:03   #12
MattR
Database Jedi
 
MattR's Avatar
 
Join Date: Jan 2001
Location: buried in the database shell (Washington, DC)
Posts: 1,108
Note the modifier systematic. Unfortunately SQL does not treat nulls in a systematic way – especially in reference to sorting and aggregates.

The folks at DBDebunk.com believe that Codd’s allowance of nulls (the initial RM did not include nulls, but were later added) was his biggest (and almost singular) mistake.

After struggling with many systems which include nulls I am inclined to agree with them.

You can accomplish the same thing without nulls as with them, and then you do not encounter all of the problems you do with null values (three valued logic, inconsistent operators, etc.).
__________________
Matt - Sybase DBA / PHP fanatic
Sybase/MySQL/Oracle | I don't like MySQL
Download Sybase | DBForums.com - for all your RDBMS talk

Last edited by MattR; Feb 24, 2004 at 07:43..
MattR is offline   Reply With Quote
Old Feb 24, 2004, 07:38   #13
r937
SQL Consultant
silver trophybronze trophy
SitePoint Award Recipient
 
r937's Avatar
 
Join Date: Jul 2002
Location: Toronto, Canada
Posts: 32,567
all the databases i've ever worked with do support nulls, so while it may be possible to accomplish the same results without using nulls, and while it may indeed be academically interesting, theoretically sound, and less problematic, it is of little practical use to me
__________________
r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
"giving out my real stuffs"
r937 is offline   Reply With Quote
Old Feb 25, 2004, 08:06   #14
MattR
Database Jedi
 
MattR's Avatar
 
Join Date: Jan 2001
Location: buried in the database shell (Washington, DC)
Posts: 1,108
What is interesting is that a “less problematic” solution is also less desirable.

But, it’s worth noting that DBMS products have all come up with their own NULL solution, few of which conform to any sort of standard (usually Oracle does but that’s often because the ANSI committee makes Oracle’s behavior the de facto standard). That is something DB designers need to keep in mind when deciding whether or not to use a NULLable column.

When you say that nulls represent ‘not applicable’ that knowledge is a fact you are not deriving from existing facts in the tables but somehow know (hard coding it in your application, etc.).

So when you really think about it the concept of NULL doesn’t make a whole lot of sense in the context of a DBMS. A database relation represents a series of assertions (facts) about entities. When you throw a null into there, you now have a meaningless assertion. As an example (in an employee table):
Bob’s salary is $125,000.
Joe’s salary is $80,000.
Matt’s salary is ?????? (null).

So in my case what does that null mean? That I scribbled “I don’t know” on the form? Or that I don’t actually earn a salary? Or that the person keying in the data forgot to type it in?

Further, nulls cause you to store meta-data and data in the same table. Whereas the existence of a salary is simply an attribute of an entity (employees have salaries), knowledge of a salary amount is an attribute of the salary amount – essentially data about the data, or metadata. Mixing the two into one table causes significant practical problems in terms of database consistency (i.e. constraints) and causes update anomalies.

One solution that sounds like a good idea is proposed by David McGoveran. His ‘Nulls as metadata’ approach basically requires the DBMS to invisibly create a separate table which would store the reason why the value would be null. For example – it could store ‘Matt’, ‘Salary’, {‘Not Provided’ or ‘Not Applicable’ or ‘Unknown’ or etc.}. When you query the salary information it will provide the appropriate value – either the dollar amount or the actual reason why the salary amount is null.
__________________
Matt - Sybase DBA / PHP fanatic
Sybase/MySQL/Oracle | I don't like MySQL
Download Sybase | DBForums.com - for all your RDBMS talk
MattR is offline   Reply With Quote
Old Feb 25, 2004, 10:32   #15
r937
SQL Consultant
silver trophybronze trophy
SitePoint Award Recipient
 
r937's Avatar
 
Join Date: Jul 2002
Location: Toronto, Canada
Posts: 32,567
please allow me to make some comments, not arguing, just offering a viewpoint:

Quote:
Originally Posted by MattR
One solution that sounds like a good idea is proposed by David McGoveran. His ‘Nulls as metadata’ approach basically requires the DBMS to invisibly create a separate table which would store the reason why the value would be null.
so instead of complaining about three-valued logic, some people will complain about n-valued logic, where you have to keep track of different meanings of null (an oxymoron to begin with), like those people who study the cardinalities of various different types of infinity (aside: are there more real numbers than integers?)

and meanwhile, in the real world, since no dbms to my knowledge does this (or else it does, and i am ignorant of it), i could just use a kludges to achieve this multi-meaning null capability, basically a FK to a "reason" table for every column that can go null...

... if i felt the application needed the type of business logic to distinguish reasons for why something is not there

Quote:
Originally Posted by MattR
Further, nulls cause you to store meta-data and data in the same table
no they don't

as above, i could choose to build this meta-data kludge, but never do

i am quite comfortable with the current standard meaning of NULL: absence of a value

okay, back up a sec

i don't actually know that this NULL is in the SQL standard, because i don't actually have a copy of the standards -- i do have several books about SQL and the SQL standard, but hey, they were written by ordinary people like you and me, and i've never forked over the couple hunnert semolians to purchase and actually study the real SQL standards documents, so i guess i'm basically winging it, assuming that the current standard does include NULL, and that all databases which implement NULL do so with the same understanding: absence of a value

and one thing that bears emphasizing at this point: we should be talking about relations, not tables


the problem i pose to anti-NULL people is this: give me a list of employees, and the salary of each employee if that employee has a salary

without NULL, that problem could only come back as two relations, one for employees, the other for employee salaries, no nulls, just missing rows

this meta-data problem of which you speak does not go away if you normalize the table into two, for employees and salaries, just so that you don't store any nulls

where's the documentation for the missing normalized salary rows?

i'm sorry, i just happen to find the LEFT OUTER join far, far too convenient to put up with two relations where clearly one is more convenient

so shoot me

somewhere here i've got bookmarked this fifty-page document out on some web site somewhere, a powerpoint or pdf document or something, where the author takes exactly that scenario, an unknown salary, and along with a couple of additional problems of integrity, proceeds to tear a perfectly manageable employee table into total shreds, ending up with about a dozen and a half individual tables, which, when properly combined with relation predicates and some smoke and mirrors, derived tables, scalar subqueries, and so forth, after umpteen joins and unions, eventually is capable of recreating the original abysmally unnormalized table

if i find it, i'll edit this post and put up the link
__________________
r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
"giving out my real stuffs"
r937 is offline   Reply With Quote
Old Mar 3, 2004, 10:22   #16
ConradM
SitePoint Member
 
Join Date: Mar 2004
Location: Juneau, Alaska
Posts: 6
Quote:
Originally Posted by r937
and one thing that bears emphasizing at this point: we should be talking about relations, not tables

the problem i pose to anti-NULL people is this: give me a list of employees, and the salary of each employee if that employee has a salary

without NULL, that problem could only come back as two relations, one for employees, the other for employee salaries, no nulls, just missing rows

this meta-data problem of which you speak does not go away if you normalize the table into two, for employees and salaries, just so that you don't store any nulls

where's the documentation for the missing normalized salary rows?

i'm sorry, i just happen to find the LEFT OUTER join far, far too convenient to put up with two relations where clearly one is more convenient
I agree, null is a kludge to avoid splintering tables.

On the employee table: where I work all clerk I step 1 employees earn the same wage. The wage is in a table linked to the employee table through job description and step tables.

Perfection is not a goal here, only a direction. I aim for robust. ;^)>

As a real-world example, I have a client tracking database with about 100 tables. If I totally normalized it, I am sure I would have over 1,000. That is probably a conservative estimate.

Conrad
ConradM is offline   Reply With Quote
Reply

Bookmarks

« Previous Thread | Next Thread »

Thread Tools
Display Modes

 
Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

 
Forum Jump


All times are GMT -7. The time now is 21:16.


Powered by vBulletin® Version 3.8.5
Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.
Copyright 1998-2009, SitePoint Pty Ltd. All Rights Reserved