Normalization in SQL

In this tutorial, you will learn about data normalization in SQL. Normalization is actually a database design method that arranges the tables in a database with reduced dependency and redundancy of data. Normalization splits up the bigger tables to smaller ones and integrated them through relationships. Normalization improves data integrity. If you fail to use normalization, you could end up facing anomalies namely insertion, update, and deletion. Insertion anomalies happen to suppose if we couldn’t insert data into the table without another attribute’s availability. Update anomalies are actually an inconsistency in the data which could lead to data redundancy and incomplete data update. Deletion anomalies happen if you lose some attributes because of deleting other attributes.

Simply, the organization of data in the DB is called data normalization. Normalization actually demands the organization of columns and tables present in a DB to make sure that their dependants were correctly administered by the DB integrity constraints. It provides more efficiency because it splits up a bigger table to smaller ones.

Purpose of Normalization

As we all know, SQL is a language that is used to communicate with the DB. Any communication of data in the database has to be initiated and that must be normalized. Otherwise, you will end up in anomalies. It will improve data distribution as well. Normalization can be achieved by using normal forms. The normal forms we are going to learn are:

1 NF (First Normal Form)
2 NF (Second Normal Form)
3 NF (Third Normal Form) and
Boyce Codd NF

Let’s see one by one with examples.

1 NF (First Normal Form)

We investigate the atomicity problem in 1 NF. In this context, atomicity implies that the values present in the table should not be divided or split up further. Simply, one cell could not carry several values. It is considered as a violation in 1 NF if a table holds a multiple value attribute. For example, have a look at the table below:

Student Admission No	Student Name	Mobile Number	Outstanding Fees
1PRI001	Aravindan	-9678900476	25,000
		-9556678854
1PRI002	Darshan	-9887765341	1,000
1PRI003	Saravanan	-9443356698	33,000
1PRI004	Ramkumar	-6345678810	50,000
		-8667890476

Evidently, you can notice that the phone number column contains more than one value and thus, it is a violation in 1 NF. If we apply 1 NF, the table will automatically get normalized (arranged) like as follows:

Student admission no	Student Name	Mobile Number	Outstanding Fees
1PRI001	Aravindan	-9678900476	25,000
1PRI001	Aravindan	-9556678854	25,000
1PRI002	Darshan	-9887765341	1,000
1PRI003	Saravanan	-9443356698	33,000
1PRI004	Ramkumar	-6345678810	50,000
1PRI004	Ramkumar	-8667890476	50,000

As per the above table, you could visualize every column with distinct values and thus we achieved atomicity using 1 NF.

Click Here – Get SQL Training with Real-Time Projects

2NF (Second Normal Form)

In the case of 2 NF, the basic need for satisfying 2 NF is that the table must be present in 1 NF and there should not be any partial dependency, which means the actual subset of the candidate key decides the attribute which is non-prime. Let’s look at an example to understand 2 NF better!

Student admission no	Class Room number	Classroom Name
1PRI001	South-A1	Blackberries
1SEC001	South-A4	Avocado
2PRI001	South-A2	Jingle bells
2SEC001	South-A5	Craneberries

normalized (arranged) as follows:

The above table contains a composite primary key namely Student admission number and Classroom number. Here, Classroom location is a non-key attribute evidently. This Classroom location will depend on the Classroom number, which is actually a part of the primary key. Thus, the above table is a violation of 2 NF. In order to change the above table to 2 NF, we have to divide the table into two portions as follows:

Student admission no	Class Room number
1PRI001	South-A1
1SEC001	South-A4
2PRI001	South-A2
2SEC001	South-A5

Student admission no	Class Room number
1PRI001	South-A1
1SEC001	South-A4
2PRI001	South-A2
2SEC001	South-A5

I hope, you could visualize that the partial dependency has been removed in the second table by applying 2 NF. So, the column Class Room Name entirely depends on the table’s primary key, i.e Class Room Number.

3NF (Third Normal Form)

In the case of 3 NF, it follows the same way that 2 NF functions. Here, the table must be present in 2 NF before working with 3 NF. Also, a transitive dependency is not allowed in 3 NF for non-prime attributes. This implies that the non-prime attributes which do not contain a candidate key will not depend on the rest of the non-prime attributes in a table. We can conclude transitive dependency is an indirect functional dependency, i.e A→C (which means A determines C) in which A→B and B→C (but the inverse is not valid i.e B→A is invalid) Let’s get a clear understanding of 3 NF with the following example:

Employee ID	Employee Name	Department ID	Department	Location
1SW15TE01	Sarath	15TE01	Testing	Hyderabad
1SW15BE01	Ramesh	15BE01	SQL	Chennai
1SW15DE01	Raj	15DE01	Dotnet	Kochi
1SW15DE02	Kumar	15DE02	Java	Bengaluru

Looking at the above table, we can understand that the Employee ID determines Department ID and Department ID determines the department. Thus, Employee ID determines Department via Department ID. This proves that we accomplished transitive function dependency. But, the above structure violates 3 NF because it does not satisfy the rules of 3 NF. So, we have to divide the tables as below:

Employee ID	Employee Name	Department ID	Location
1SW15TE01	Sarath	15TE01	Hyderabad
1SW15BE01	Ramesh	15BE01	Chennai
1SW15DE01	Raj	15DE01	Kochi
1SW15DE02	Kumar	15DE02	Bengaluru

Department ID	Department
15TE01	Testing
15BE01	SQL
15DE01	Dotnet
15DE02	Java

From the above tables, you could visualize that the entire non-key attributes become completely dependent on the primary key. As in the first table, Employee Name, Department ID and Location depends on Employee ID, whereas in the second table, the Department depends on Department ID.

Let’s move on to the final topic.

Click Here – Get Prepared for SQL Interviews

Boyce Codd NF (BCNF)

BCNF is also called as 3.5 NF because it is an upgrade of 3 NF. Two researchers Boyce and Codd developed this BCNF concept so as to address some particular anomalies that that doesn’t fall under the 3 NF category. Like other NF techniques, BCNF also has certain conditions to be satisfied. First, BCNF should satisfy 3 NF. In the case of BCNF, if each and every functional dependency, X → Y, then, X will act as the Super key of that specific table.

For example, have a look at the table below:

Stud ID	Course of Study	Name of the Professor
1SD17SW01	Java	Magesh
1SD17SW02	Dotnet	Karthik
1SD17SW03	C++	Praba
1SD17SW04	Dotnet	Ramesh
1SD17SW05	SQL	Lokesh

As per the above table, we can clarify the following:

Any student can select multiple subjects of study
You can have multiple teachers to teach one particular subject.
For every subject, a teacher has to allocated to the student.

In the above table, except for the BCNF, all other NF techniques were satisfied. Let’s discuss the reason of it. Stud ID and Course of Study provides the primary key. This implies that the Course of Study column is actually a prime attribute. We could see yet another dependency here, i.e Name of the Professor→ Course of Study.

Here, Course of Study is actually a prime attribute whereas the Name of the Professor is a nonprime attribute, which is actually a violation of BCNF. Therefore, to achieve BCNF, we have to separate the table into two portions as Stud ID which is there already and another new column named Prof ID.

Stud ID	Prof ID
1SD17SW01	1PF17SW01
1SD17SW02	1PF17SW02
1SD17SW03	1PF17SW03
1SD17SW04	1PF17SW04
1SD17SW05	1PF17SW05

In the second table, Prof ID, Name of the Professor and Course of Study will be present.

Prof ID	Name of the Professor	Course of Study
1PF17SW01	Magesh	Java
1PF17SW02	Karthik	Dotnet
1PF17SW03	Praba	C++
1PF17SW04	Ramesh	Dotnet
1PF17SW05	Lokesh	SQL

With this, we achieved BCNF. We thus conclude this tutorial about Normalization in SQL. I hope you got a better understanding!

Click Here – Enroll Now!

General

Normalization in SQL