Normalization and Normal Forms in DBMS

If a dataset is maintained in the form of just a single table, it leads to Data Redundancy, which means a single value of data is stored multiple times. This leads to many issues like an increment in the database size, slower data retrieval, and data inconsistency.  Thus, to overcome this, Normalization in DBMS  is used in which a large table is reduced into smaller tables until each of the single tables contains one relation.

As a result of Normalization, the redundancy in a table is removed which improves the efficiency of the database system. Let’s understand the Need for Normalization in DBMS using a simple example. Consider a table having Student ID, name, Branch Name, and Branch Code.

This is the single table storing all the data related to a college student. Storing data in this way leads to the following problems:

  • If you want to “insert” a name and ID of a new student, you cannot do it until you don’t have his branch name and branch code. This is called Insertion Anomaly.
  • If you want to “delete” a student name, then the branch name and code will also be deleted. The deleted branch name and code cannot be recovered again. This is called Deletion Anomaly.
  • If you want to “update” the branch name of John from CS to Civil, the update for Brat will also happen. Thus, multiple updations of single data occur. This is called Updation Anomaly.

A simple solution to the above problems is to reduce the dataset into two tables using Normalization in DBMS in the following way.

As you can see, there is no need to repeatedly store the branch data. Also, new data can be inserted into the data separately without any anomaly. Therefore, Normalization in DBMS provides us with a way to remove the above anomalies using the concept of Normal Forms in DBMS. Before learning about the types of Normal Forms in DBMS, you should be aware of the concept of Functional Dependency in DBMS.

Types of Normal Forms

There are the following Normal Forms in DBMS:

First Normal Form (1NF)

The First Normal Form of Normalization in DBMS states that an attribute of a Table cannot hold Multiple Values. We can also say that an attribute must be single-valued. In other words, the composite attribute or multivalued attribute is not allowed.

Thus, A Table is said to be in 1NF if:

  • All columns contain atomic values.
  • All values of a column(attribute) belong to the same domain.
  • Column names are unique.

For example, if there are multiple branch names for a student, they must be converted to a single-valued attribute as shown below.

Second Normal Form (2NF)

A Table ( Relation) in the Database exists in the Second Normal Form if:

  • It exists in the 1NF.
  • It does not have Partial Dependency

Let’s understand the Second Normal Form of Normalization in DBMS with an example. Suppose there is a table ‘Product’ containing the Customer ID, Name, and Price of Products.

This table contains some redundancy as the price of the Fan is stored twice. This table is in 1NF as all values are atomic. Now, let’s see for 2NF.

Here,

Candidate Keys are {Customer_ID, Name}. The Prime Attributes ( attributes that are a part of the candidate key) will be Customer_ID and Name and the Non-Prime Attribute will be Price.  We see that Customer_ID->Price is a Partial Functional Dependency because Price (non-prime attribute) depends on Customer_ID(a subset of the candidate key).

Therefore, we conclude the table is not in 2NF. To convert the table to 2NF using Normalization in DBMS, we decompose it into two tables as given below:

Both the above tables are in 1NF. Also, the Functional Dependencies are Customer ID->Product Name and Product Name->Price in which there is no Partial Dependency. Also, the price of the Fan is not stored multiple times. So, Data Redundancy is removed.

Third Normal Form (3NF)

The Third Normal Form exists in a table if:

  • It exists in the 2NF.
  • There is no Transitive Dependency for non-prime attributes in the table.

Now, let’s first see Transitive Functional Dependency before going ahead with the Normalization in DBMS.

Transitive Functional Dependency

Transitive Functional Dependency occurs when a non-prime attribute is dependent on another non-prime attribute. For example, in a relation, if Student ID determines Student’s City then Student ID -> Student City is a Functional Dependency. And, if Student’s State is determined by Student’s City then Student City->Student State is also a Functional Dependency.

But, Student City is a Non-Prime Attribute that depends on Student ID (Prime Attribute). Therefore, Student State becomes indirectly dependent upon Student ID. We conclude that If Student ID -> Student City and Student City->Student State exists, then Student ID->Student State also exists.

Now, let’s see how to convert a table into the Third Normal Form (3NF) of Normalization in DBMS. Suppose there is a table containing the data of students as follows:

In this table, the Candidate Key is Student ID which is a prime attribute, and the rest of the attributes are Non-Prime. Thus, there is no partial dependency. So, it is in 2NF.

But, STU_ID->ZIPCODE and ZIPCODE->STU_STATE are the two functional dependencies that exist. Since Zipcode is a non-prime attribute, STU_STATE indirectly depends on STU_ID. Therefore, STU_ID->STU_STATE also exists which is a Transitive Functional Dependency.

To convert this into 3NF, we decompose the relation into two tables as below.

Now, both the tables are free from Transitive Functional Dependency and thus exist in 3NF of Normalization in DBMS.

Boyce-Codd Normal Form (BCNF)

A table or a relation exists in the BCNF if:

  • It exists in 3NF.
  • For every Functional Dependency A->B in the table, A is a Super Key.

In simple words, if there exists a Functional Dependency X->Y in the table such that Y is a Prime Attribute and X is a non-prime attribute, then the table is not in BCNF.

Let’s see how to identify whether a Table exists in BCNF or not using an example. Suppose there is a table named ‘Customer Service’ which contains the data about the Product, Customer, and Seller. We can observe here that despite Normalization up to 3NF, there is Data Redundancy in the table.

For the given table:

  • Candidate Keys = {Product ID, Customer Name}
  • Prime Attributes: Product ID and Customer Name
  • Non-Prime Attributes: Seller Name

Therefore, the following conclusions can be drawn from the table:

  1. Since all the values in the columns are atomic, the Table is in 1NF.
  2. The dependency Customer Name ->Seller Name is not possible because One Customer may purchase from more than one seller. Thus, there is no Partial Dependency i.e. seller name(non-prime attribute) does not depend on Customer Name(a subset of Candidate Key). Thus, the Table is in 2NF.
  3. Functional Dependencies present in the table are:

·         Product ID, Customer Name->Seller Name

·         Seller Name->Customer Name

For Transitive Dependency, a non-prime attribute should determine another non-prime attribute. But here, Seller Name is a non-prime attribute but Customer Name is a Prime Attribute. Therefore, there is no Transitive Dependency. Thus, Table exists in 3NF.

4.  For a table to exist in BCNF, in Every Functional Dependency, LHS should be a Super Key. But in Seller Name->Customer Name, Seller Name is not a Key. Therefore, the table does not exist in BCNF.

Now, let’s convert the table into BCNF Form of Normalization in DBMS. We break the table into two tables as follows:

Therefore, the above two tables satisfy the conditions of BCNF. This is how the Normal Forms in DBMS remove the data redundancy in a Relational Database.