Concepts of Database Design and Management

It’s not difficult to find an online tutorial on creating a database. It’s not hard to find a tutorial on how to create queries. In fact, it’s not hard to find tutorials on syntax for SQL, the differences of data types, and even which database is best for what purposes. It is, however, difficult to find good tutorials on the theories of designing, developing, and maintaining a quality database.

Database design and management isn’t very difficult. People much wiser than we have designed some very orderly and sound rules to follow and developed these rules into what is called the Normalization Process.

Using this process, you can create brand new, fully functional, finely tuned databases or take current database tables, run them through these steps, and come out with well-oiled tables ready to fly. However you use these steps, they are the fundamentals of quality database design.

Key Takeaways

Database design and management are guided by established principles such as the Normalization Process to ensure efficient, functional databases that adhere to quality standards.
Functional dependence is a core concept in database design, indicating that the value of one field (A) determines the value of another field (B), which is essential for understanding data relationships and dependencies.
Primary keys are critical in database design as they uniquely identify each row in a table; understanding and correctly implementing primary keys and candidate keys is fundamental for database integrity.
The Normalization Process, including First Normal Form (1NF), Second Normal Form (2NF), and Third Normal Form (3NF), systematically eliminates redundancies and dependencies, enhancing database performance and reliability.
Advanced normalization forms like the Fourth Normal Form (4NF) address multivalued dependencies, ensuring that the database structure supports robust data integrity and efficient querying without anomalies.

Functional Dependence

Before we jump into the Normalization Process, I should take a step back and clear a few things up. First, this is not specific to any one type of database. These are rules that should be followed when using any database system, whether it is Oracle, MySQL, PostgreSQL, SQL Server, etc.

Let us first discuss Functional Dependence, which is crucial in understanding the Normalization Process. This is merely a big term for a relatively simple idea. To illustrate it, lets take a look at a small sample table.

Sales Rep Number	Pay Class	Rate
001	1	.05
002	1	.05
003	1	.05
004	2	.07
005	1	.05
006	3	.09

This relatively simple table is a good example of functional dependence, it can also be used to illustrate a point.

Definition: A column is functionally dependent on another column if a value ‘A’ determines a single value for ‘B’ at any one time.

Sound confusing? Let me explain. The field ‘Rate’ is functionally dependent on the field ‘Pay Class’. In other words, Pay Class determines Rate.

To determine functional dependency, you can think of it like this: Given a value for Field A, can you determine the single value for B? If B relies on A, then A is said to functionally determine B.

Taking the same table as above, lets add to it.

Name	Sales Rep Number	Pay Class	Rate
Ward	001	1	.05
Maxim	002	1	.05
Cane	003	1	.05
Beechum	004	2	.07
Collins	005	1	.05
Cannery	006	3	.09

Now, lets look at this table and find some more Functional Dependencies. We already know that Pay Class determines Rate. We can also say that Sales Rep Number determines Last Name. Only one Sales Rep Number for each Last Name. This fits the definition of a Functional Dependency.

But does Last Name functionally determine anything? At first glance, some people might say yes, however, this is not true. Currently, you can say that Ward will only give you one Sales Rep Number, however, what if we hired another person with the name Ward? Then you would have two values for your Sales Rep Number, and then Last Name would no longer functionally determine anything.

On Keys

Now that we know what functional dependence is, we can clarify keys. Now, if you are working in databases, you probably already know what Primary Keys are. But, can you define them?

Definition: Column A is the primary key for table T if:
Property 1. All columns in T are functionally dependent on A
Property 2. No sub collections of the columns in table T also have Property 1.

This makes perfect sense. If all your fields in a database are dependent on one and only one field, then that field is the key. Now, occasionally Property 2 is broken, and two fields are candidates for the Primary Key. These keys are then called candidate keys. From these candidate keys, one key is chosen and the others are called alternate keys.

For example, in the same table as before:

Name	Sales Rep Number	Pay Class	Rate
Ward	001	1	.05
Maxim	002	1	.05
Cane	003	1	.05
Beechum	004	2	.07
Collins	005	1	.05
Cannery	006	3	.09

Our primary key is the Sales Rep Number, as it fits the definition of a Primary Key. Everything in the table is dependent on the Sales Rep Number, and nothing else can claim the same thing. Now, let us take this one step further, and assume that we also have the Social Security number of the employee in the table as well.

Name	Sales Rep Number	Pay Class	Rate	Soc.Sec. no.
Ward	001	1	.05	133-45-6789
Maxim	002	1	.05	122-46-6889
Cane	003	1	.05	123-45-6999
Beechum	004	2	.07	113-75-6889
Collins	005	1	.05	121-44-6789
Cannery	006	3	.09	111-45-9339

Now, we have two Candidate Keys, Sales Rep Number and Social Security Number. So, what we have to decide is which field to use, as both will be unique. In the end, it would be best to have the Sales Rep Number as the Primary Key for various reasons.

First Normal Form

Now that we have clarified these concepts, lets move into the Normalization Process. The First Normal Form is defined as a table that does not contain repeating groups.

For example:

Order Number	Order Date	Part Number	Number Ordered
5245	3/02/99	35436	23
5246	3/02/99	32116	11
5247	4/02/99	32133	52
–	–	12311	10
5248	4/02/99	46563	1

Now, from that example, you should easily see the problem. First off, in this setup, Order Number is considered the Primary Key, but this is not entirely true. The true Primary Key is Order Number and Part Number. In every order, a part will only ever be ordered once (though the amount ordered can be greater than 1). However, multiple parts can be ordered in on Order Number. So, when we retool this table, we end up getting this:

Order Number	Order Date	Part Number	Number Ordered
5245	3/02/99	35436	23
5246	3/02/99	32116	11
5247	4/02/99	32133	52
5247	4/02/99	12311	10
5248	4/02/99	46563	1

This is much easier to understand, and is a much better design then the previous table. And this time, the Primary Key is both the Order Number and Part Number, and everything else in the table depends on that key.

This was the First Norm Form, or 1NF. Most people can do this right, however, I have seen some tables designed without this in mind, and for a while, it looks like things are all right, but when you start filling the table up with lots of information, it can become quite cumbersome. Definitely, not a shining point in someone’s career, but a most valuable lesson indeed.

Second Normal Form

First, lets just say this. The table is automatically 2NF if its Primary Key contains only one column. That was easy, wasn’t it? But then, if your Primary Key has more than one column, read on.

Lets jump right into the table here. Primary Keys are Order Number (Order #) and Part Number (Part #).

Order #	Order Date	Part #	Part Desc.	Number Ordered	Price
5245	3/02/99	35465	Gas Key	1	$10
5246	3/02/99	65466	Lawn Chair	4	$25
5246	3/02/99	65473	Picnic Table	1	$30
5247	3/02/99	44654	Lawn Darts	1	$45
5248	4/02/99	44665	Volleyball	3	$20

At first glance, everything is okay with this table. However, there are some problems that need to be resolved. Lets first map out the functional dependencies.

Order Number -> Order Date

Part Number -> Part Description

Order Number, Part Number -> Number Ordered, Price, Order Date, Part Description

As you can see, this is a big mess, and needs to be taken care of. In fact, there are other problems with this table as well.

First, a change to the Part Description in this table would also mean a change in other tables, like the Products table. This is obviously cumbersome to code, and cumbersome to run.

Second, you could very well have different descriptions for the same part. This is simply not acceptable in a business environment.

This is because Part Description is dependent on only part of the primary key, and that is Part Number. Part Description is therefore a nonkey attribute. A nonkey attribute is simply a column that is not a part of the primary key.

This relates specifically to the 2NF definition, which is a table that is in the 1NF and no nonkey attribute is dependent on only a portion of the primary key. This of course reaffirms the fact that if the table only contains one column in its primary key, it is in second normal form.

So, how do we solve this problem? Well, we already solved this problem above. Here are the three new tables for the one above, with the first part being the primary keys and the second part being the other fields in the table.

Order Number -> Order Date

Part Number -> Part Description

Order Number, Part Number -> Number Ordered, Price

Those are our new tables. Notice that I took out the Order Date and the Part Description, as there is no need to use data that is already stored in other tables. Also note that you only have one description for each part. Quality database design needs no redundant data.

Third Normal Form

So, you think your table is clear now. You think it is ready for the big leagues? No…not yet. The 3NF is ready to take on new challenges. Lets go right to our example table:

Customer #	Name	Address	Credit	Sales Rep #	Sales Rep Name
2343	Bob	2343 Wee St.	$1000	03	Dave
2344	Daniel	1230 Wow St.	$3000	04	Pam
2345	Kate	33 Soho	$5000	03	Dave
2346	Less	99 More Is	$1000	07	Garrett
2347	Dan	1 Less St.	$500	09	Peter
2348	Bruno	45 Candy Ln.	$2000	03	Dave

Now, you probably recognize a few problems, but be assured, this is 2NF. Every field depends on the Customer Number. For example, Customer Number 2345 will only have one name, address, credit, sales rep number, and sales rep name associated with it. However, this doesn’t mean the table is ready for the prime time.

First, we need to define a determinant as any column or collection of columns that determine another column. By this definition, and primary key, or any candidate key will be a determinant. Also, that would make Sales Rep Number a determinant, but it’s not the candidate key or a primary key.

So, how does this relate to the 3NF? Well, the definition of the 3NF is a table that complies with the 2NF (and of course, the 1NF) and if the only determinants it contains are candidate keys. This does, of course, include the primary key.

Now, you have just discovered the problem with the table: the determinant of Sales Rep Number. So how do you go about getting rid of it?

First, track down all the determinants that are NOT a candidate key. Then, remove all the fields that rely on this determinant key, but keep the determinant key in the table. Then, with all the fields you removed, put them into a table, with the primary key being the determinant key you left in the main table. So, that would mean our original table of:
Customer #, Name, Address, Credit, Sales Rep #, Sales Rep Name
Would be turned into these two tables:
Customer #, Name, Address, Credit, Sales Rep # Sales Rep #, Sales Rep Name
Quality.

Remember, these two new tables need to comply with 1NF, 2NF, and 3NF as well, so double check them, though if you are up to this point, you should be doing okay.

Fourth Normal Form

Finally, we are up to the big one. 4NF is the father of the forms, as it is the be all and end all. This takes care of any problem that may occur. Lets start this time by defining a very important term, multivalued dependence (MD). MD is when field B is multidependent on A if each value of A is associated with a specific list of values for B, and this collection is independent of any values of C.

Lets take a deep breath, and illustrate this with a simple table. Lets assume that Faculty represents the guidance counselor for any number of students represented by Student Number. Also, we can assume that the faculty member will be a part of any number of committees, and we want to store this as well. So, this is what we come up with.

Faculty	Student Number	Committee Code
1243	2343	ADV
Â	2345	PER
Â	Â	HSG
1553	3243	ADV
4003	3408	HSG
Â	4095	Â
Â	4403	Â

Now, passing this to 1NF would give you this result:

Faculty	Student Number	Committee Code
1243	2343	ADV
1243	2345	PER
1243	2345	HSG
1553	3243	ADV
4003	3408	HSG
4003	4095	HSG
4003	4403	HSG

But, you can clearly see the problem developing there as well. This is what MD is. If Faculty number 1243 was no longer counseling Student Number 2345, and we deleted the data, then the Committee information would be deleted as well. This is not quality.

A 4NF is 3NF compliant, and there are no multivalued dependencies.

You would handle this problem like in 2NF and 3NF, by splitting the table up into smaller tables with each containing the field that multidetermined them, and in this case, that would be the Faculty field. So, the new tables would be:
Faculty, Student Number Faculty, Committee Code

Conclusion
Truly, that was a lot of work, but it is well worth it. Making sure your tables are optimized is key to quality design. Taking care of the problem before it becomes a problem is vital to managing a database.

Before you sit down to design the database, gather all the information you want to include in the database. I mean everything. Go around to each department of the company (or just write it out yourself if this is just for you) and find out what everyone wants in the database. Once you have everything, bring it back, and create one huge table.

From there, break that table down to 1NF, then 2NF, and so on. Go back over each table, and make sure they all work together, and are all 4NF tables. If they aren’t, then it can be assured the tables will suffer problems in the future.

Quality is in the design. And for those people who know, this helps comply with Codd’s first 2 rules for a truly relational database system.

Happy coding.

Frequently Asked Questions (FAQs) on Database Design and Management

What are the key principles of database design?

The key principles of database design include understanding the business requirements, defining the purpose of the database, identifying the entities and relationships, and normalizing the database design. It’s crucial to ensure data integrity, security, and performance. The design should be flexible enough to accommodate future changes and scalable to handle increasing data volumes.

How does normalization contribute to effective database design?

Normalization is a process that helps eliminate data redundancy and improve data integrity. It involves organizing data into tables based on relationships, ensuring each table serves a single purpose. This makes the database more efficient, easier to manage, and reduces the chances of data anomalies.

What are the different types of database relationships?

There are three types of database relationships: one-to-one, one-to-many, and many-to-many. Understanding these relationships is crucial in designing a database as they define how data in different tables relate to each other, which in turn affects how data is accessed and managed.

What is the role of SQL in database management?

SQL (Structured Query Language) is a programming language used to manage and manipulate databases. It allows you to create, retrieve, update, and delete database records. It’s also used for managing database structures and controlling access to the database.

How can I ensure data security in my database design?

Data security can be ensured by implementing measures such as user authentication, access controls, data encryption, and regular backups. It’s also important to keep the database software and systems updated to protect against vulnerabilities.

What is the difference between a relational database and a non-relational database?

A relational database organizes data into tables and uses SQL for database management. It’s ideal for structured data and complex queries. On the other hand, a non-relational database, also known as NoSQL, can handle unstructured data and is more flexible and scalable, making it suitable for big data applications.

How does database indexing improve performance?

Database indexing creates a data structure that improves the speed of data retrieval operations. It works similarly to an index in a book, allowing the database management system to find the data without having to scan every row in a table.

What is the role of a database management system (DBMS)?

A DBMS is software that interacts with the user, applications, and the database itself to capture and analyze data. It provides a way to manage, control, and maintain databases, ensuring data consistency, integrity, and security.

What are the stages of database design?

The stages of database design include requirement analysis, conceptual design, logical design, physical design, and implementation. Each stage has specific tasks and outputs that contribute to the overall design of the database.

How do I choose the right DBMS for my database?

Choosing the right DBMS depends on several factors such as the size of the database, the nature of the data, the required performance, the budget, and the technical skills of the team. It’s important to evaluate different DBMS options and choose one that best fits your needs.