Normalization in SQL
Normalization in SQL
In this tutorial, you will learn about data normalization in SQL. Normalization is actually a database design method that arranges the tables in a database with reduced dependency and redundancy of data. Normalization splits up the bigger tables to smaller ones and integrated them through relationships. Normalization improves data integrity. If you fail to use normalization, you could end up facing anomalies namely insertion, update, and deletion. Insertion anomalies happen to suppose if we couldn’t insert data into the table without another attribute’s availability. Update anomalies are actually an inconsistency in the data which could lead to data redundancy and incomplete data update. Deletion anomalies happen if you lose some attributes because of deleting other attributes.
Simply, the organization of data in the DB is called data normalization. Normalization actually demands the organization of columns and tables present in a DB to make sure that their dependants were correctly administered by the DB integrity constraints. It provides more efficiency because it splits up a bigger table to smaller ones.
Purpose of Normalization
As we all know, SQL is a language that is used to communicate with the DB. Any communication of data in the database has to be initiated and that must be normalized. Otherwise, you will end up in anomalies. It will improve data distribution as well. Normalization can be achieved by using normal forms. The normal forms we are going to learn are:
- 1 NF (First Normal Form)
- 2 NF (Second Normal Form)
- 3 NF (Third Normal Form) and
- Boyce Codd NF
Let’s see one by one with examples.
1 NF (First Normal Form)
We investigate the atomicity problem in 1 NF. In this context, atomicity implies that the values present in the table should not be divided or split up further. Simply, one cell could not carry several values. It is considered as a violation in 1 NF if a table holds a multiple value attribute. For example, have a look at the table below:
|Student Admission No||Student Name||Mobile Number||Outstanding Fees|
Evidently, you can notice that the phone number column contains more than one value and thus, it is a violation in 1 NF. If we apply 1 NF, the table will automatically get normalized (arranged) like as follows:
|Student admission no||Student Name||Mobile Number||Outstanding Fees|
As per the above table, you could visualize every column with distinct values and thus we achieved atomicity using 1 NF.
2NF (Second Normal Form)
In the case of 2 NF, the basic need for satisfying 2 NF is that the table must be present in 1 NF and there should not be any partial dependency, which means the actual subset of the candidate key decides the attribute which is non-prime. Let’s look at an example to understand 2 NF better!
|Student admission no||Class Room number||Classroom Name|
normalized (arranged) as follows:
The above table contains a composite primary key namely Student admission number and Classroom number. Here, Classroom location is a non-key attribute evidently. This Classroom location will depend on the Classroom number, which is actually a part of the primary key. Thus, the above table is a violation of 2 NF. In order to change the above table to 2 NF, we have to divide the table into two portions as follows:
|Student admission no||Class Room number|
|Student admission no||Class Room number|
I hope, you could visualize that the partial dependency has been removed in the second table by applying 2 NF. So, the column Class Room Name entirely depends on the table’s primary key, i.e Class Room Number.
3NF (Third Normal Form)
In the case of 3 NF, it follows the same way that 2 NF functions. Here, the table must be present in 2 NF before working with 3 NF. Also, a transitive dependency is not allowed in 3 NF for non-prime attributes. This implies that the non-prime attributes which do not contain a candidate key will not depend on the rest of the non-prime attributes in a table. We can conclude transitive dependency is an indirect functional dependency, i.e A→C (which means A determines C) in which A→B and B→C (but the inverse is not valid i.e B→A is invalid) Let’s get a clear understanding of 3 NF with the following example:
|Employee ID||Employee Name||Department ID||Department||Location|
Looking at the above table, we can understand that the Employee ID determines Department ID and Department ID determines the department. Thus, Employee ID determines Department via Department ID. This proves that we accomplished transitive function dependency. But, the above structure violates 3 NF because it does not satisfy the rules of 3 NF. So, we have to divide the tables as below:
|Employee ID||Employee Name||Department ID||Location|
From the above tables, you could visualize that the entire non-key attributes become completely dependent on the primary key. As in the first table, Employee Name, Department ID and Location depends on Employee ID, whereas in the second table, the Department depends on Department ID.
Let’s move on to the final topic.
Boyce Codd NF (BCNF)
BCNF is also called as 3.5 NF because it is an upgrade of 3 NF. Two researchers Boyce and Codd developed this BCNF concept so as to address some particular anomalies that that doesn’t fall under the 3 NF category. Like other NF techniques, BCNF also has certain conditions to be satisfied. First, BCNF should satisfy 3 NF. In the case of BCNF, if each and every functional dependency, X → Y, then, X will act as the Super key of that specific table.
For example, have a look at the table below:
|Stud ID||Course of Study||Name of the Professor|
As per the above table, we can clarify the following:
- Any student can select multiple subjects of study
- You can have multiple teachers to teach one particular subject.
- For every subject, a teacher has to allocated to the student.
In the above table, except for the BCNF, all other NF techniques were satisfied. Let’s discuss the reason of it. Stud ID and Course of Study provides the primary key. This implies that the Course of Study column is actually a prime attribute. We could see yet another dependency here, i.e Name of the Professor→ Course of Study.
Here, Course of Study is actually a prime attribute whereas the Name of the Professor is a nonprime attribute, which is actually a violation of BCNF. Therefore, to achieve BCNF, we have to separate the table into two portions as Stud ID which is there already and another new column named Prof ID.
|Stud ID||Prof ID|
In the second table, Prof ID, Name of the Professor and Course of Study will be present.
|Prof ID||Name of the Professor||Course of Study|
With this, we achieved BCNF. We thus conclude this tutorial about Normalization in SQL. I hope you got a better understanding!