Normalization in DBMS: Normal Forms, Dependencies & SQL Examples
Normalization in DBMS is the systematic process of organizing relational tables to reduce redundancy and prevent update, insert, and delete anomalies. It matters because a poorly designed payments table can store the same customer, bank, and transaction details many times, making corrections risky. After reading, you can normalize real schemas step by step.
Normalization in DBMS with examples sits at the center of relational database design, especially when schemas power banking, healthcare, SaaS, ed-tech, and e-commerce systems. It relies on keys, functional dependencies, multivalued dependencies, and lossless decomposition;
You will be able to identify the correct normal form, decompose tables safely up to BCNF, 4NF, and 5NF, write SQL DDL for normalized schemas, and answer GATE or interview questions on 1NF, 2NF, 3NF, and dependency preservation.
Who This Guide Is For
This guide is specifically designed for:
Core Concepts
Before applying types of normalization in DBMS, you need five foundations: attributes, keys, functional dependencies, anomalies, and decomposition. The normal forms in DBMS with examples progress from removing repeating groups in 1NF to handling join dependencies in 5NF and advanced domain-key constraints in DKNF.
1.Dependencies and Keys
A functional dependency says that one set of attributes determines another set. If AadhaarNumber determines LegalName and DateOfBirth in a citizen identity table, the dependency is AadhaarNumber → LegalName, DateOfBirth. In an industry setting, a PaymentGatewayTxnId can determine amount, timestamp, merchant account, and settlement status for a UPI transaction. Keys use dependencies to identify rows: a superkey uniquely identifies rows, a candidate key is a minimal superkey, a primary key is the chosen candidate key, and a foreign key points to a key in another table. Normalization starts by asking which facts depend on which identifiers, not by blindly splitting tables.
Code Example
2.Anomalies and Redundancy
Normalization exists because redundant tables create anomalies. An update anomaly occurs when the same fact appears in many rows and one copy is changed but another is missed. An insert anomaly occurs when you cannot add one fact without another unrelated fact. A delete anomaly occurs when deleting one row accidentally removes the only record of another fact. A familiar example is a grocery billing sheet that repeats supplier phone numbers for every item sold; changing a supplier number becomes error-prone. In healthcare, a patient visit table that repeats doctor specialization and clinic license details for every appointment can lose clinic information if the last appointment row is deleted.
Code Example
3.Unnormalized Form
Unnormalized Form, often called UNF or 0NF, is not a real normal form target; it is the raw state before normalization. It contains repeating groups, nested records, multi-valued cells, or columns such as Phone1, Phone2, Phone3. A familiar Indian example is a housing society spreadsheet where one flat row stores all resident names and vehicle numbers in comma-separated cells. An industry example is a logistics shipment extract where one row contains multiple scan events, hub names, and timestamps as a single JSON-like text field. UNF is common in Excel imports and API dumps, but relational design requires separating each independent fact into rows and tables.
Code Example
4.First Normal Form
First normal form in DBMS requires atomic values, no repeating groups, and identifiable rows. Atomic means each cell contains one value of the correct type, not a list. In an IRCTC booking scenario, passenger names should not be stored as one comma-separated column inside a ticket row; each passenger on a PNR should be a separate row linked to the booking. In a pathology lab system, one invoice should not store all test codes in a single text field; each ordered test needs its own row so prices, reports, and status can be tracked independently. 1NF improves filtering, validation, indexing, and joins, but it does not solve partial or transitive dependencies.
Code Example
5.Second Normal Form
Second Normal Form requires the table to be in 1NF and every non-prime attribute to depend on the whole candidate key, not just part of a composite key. This is relevant only when the key has multiple attributes. In an e-commerce order line table with key OrderId + ProductId, ProductName depends only on ProductId, while Quantity depends on the full line. That partial dependency violates 2NF. In SaaS billing, if a subscription add-on table is keyed by AccountId + AddOnCode, AddOnName and MonthlyRate may depend only on AddOnCode, so they belong in an add-on catalog. For deeper practice, compare this with Second Normal Form in DBMS.
Code Example
6.Third Normal Form
3NF in DBMS requires the relation to be in 2NF and removes transitive dependencies among non-key attributes. A transitive dependency appears when Key → NonKey1 and NonKey1 → NonKey2. A familiar example is CustomerId → Pincode and Pincode → City, State; storing city and state repeatedly in the customer table creates update issues when postal data changes. In an insurance platform, PolicyId may determine AgentCode, and AgentCode may determine AgentName and BranchCity; agent details should live in an agent master table. 3NF is widely used in production OLTP databases because it balances clean design with practical dependency preservation.
Code Example
7.Boyce-Codd Normal Form
BCNF is a stricter version of 3NF. For every non-trivial functional dependency X → Y, X must be a superkey. A relation can be in 3NF but not BCNF when a determinant is not a candidate key, especially with overlapping candidate keys. A familiar example is a coaching institute timetable where each Room at a time has one Mentor, and each Mentor at that time has one Room, but the stored relation also includes Topic. In an airline maintenance system, if a certified inspector determines a hangar for a shift, and aircraft checks are recorded in the same table, inspector-to-hangar facts may cause BCNF violations. BCNF removes stronger anomaly risks, but decomposition can sometimes lose dependency preservation.
Code Example
8.Fourth Normal Form
4NF handles multivalued dependencies, written as X ↠ Y, where one entity independently has multiple values of one attribute and multiple values of another attribute. If these independent lists are stored together, the table creates unnecessary combinations. A familiar example is a restaurant that has multiple cuisines and multiple delivery zones; cuisines and zones are independent facts, so combining them in one row set creates a Cartesian explosion. In healthcare, a doctor may speak multiple languages and practise at multiple clinics; language ability and clinic availability should be stored separately unless a specific language is available only at a specific clinic. 4NF requires BCNF plus no non-trivial multivalued dependency unless the determinant is a superkey.
Code Example
9.Fifth Normal Form
5NF, also called Project-Join Normal Form, deals with join dependencies that cannot be handled by functional or multivalued dependencies alone. It applies when a valid fact is truly determined by the lossless combination of three or more projections. A familiar example is an event vendor marketplace where a caterer can serve certain dishes, operate in certain venues, and specific dish-venue combinations are allowed; storing every vendor-dish-venue triple may introduce false assumptions unless the business rules are decomposed correctly. In manufacturing, a supplier may be approved for parts, plants, and part-plant combinations; a supply record is valid only when all required pairwise approvals exist. 5NF is less common in day-to-day CRUD design but appears in advanced database theory and high-integrity rule systems.
Code Example
10.Domain-Key Normal Form
Domain-Key Normal Form is a theoretical ideal where every constraint on the database follows from domain constraints and key constraints. A domain constraint limits valid values, such as account status being only ACTIVE, FROZEN, or CLOSED. A key constraint ensures uniqueness and identity. A familiar example is a PAN verification table where PAN format, uniqueness, and taxpayer category domains capture most rules. In banking, a loan account table may enforce account number uniqueness, allowed loan types, sanctioned amount ranges, and valid risk grades. DKNF is difficult because many real rules are cross-row, temporal, or procedural, but thinking in DKNF helps developers move business rules from application code into database constraints where possible.
Code Example
11.Sixth Normal Form
6NF decomposes relations so that each relation represents an irreducible fact, often with time validity. It is mainly associated with temporal databases and data vault-style modeling, not standard undergraduate normalization problems. A familiar example is a prepaid mobile plan where plan price, data quota, and validity days can change independently over time; storing all attributes in one row forces unnecessary versioning. In IoT manufacturing, a machine sensor may change calibration value, firmware version, and location at different times; separate temporal tables record each changing fact accurately. 6NF can increase join complexity, so it is chosen when history precision is more important than simple querying.
Code Example
12.Lossless Decomposition
Decomposition means splitting a relation into smaller relations. A decomposition must be lossless, meaning the original relation can be reconstructed by joining the decomposed tables without losing rows or creating spurious rows. A familiar example is splitting a vehicle registration table into Vehicle and Owner tables using RegistrationNumber and OwnerId; joining through the key should recover valid ownership facts. In ed-tech, splitting enrollment payments into Learner, CourseBatch, and PaymentReceipt tables must preserve which learner paid for which batch and receipt. Dependency preservation is a separate property: after decomposition, you should still be able to enforce original dependencies without joining too many tables. In theory exams, lossless join is mandatory; dependency preservation is preferred but BCNF may sacrifice it.
Code Example
13.Controlled Denormalization
Denormalization is the deliberate introduction of redundancy after understanding the normalized design. It is not a substitute for normalization; it is a performance or reporting decision with synchronization rules. A familiar example is a food delivery app storing order_total in the order header even though it can be calculated from item rows, because the checkout and support screens need fast access. In analytics, a banking dashboard may store daily branch-level transaction totals instead of scanning millions of transaction rows for every report. Denormalization should be measured, documented, and protected with triggers, materialized views, batch jobs, or application-level consistency checks.
Code Example
Learning Path
Use this sequence to move from recognizing bad spreadsheet-style tables to solving advanced dependency questions confidently.
Frequently Asked Questions
What is normalization in DBMS with examples?
Normalization in DBMS is the process of organizing tables to reduce redundancy and prevent anomalies. For example, instead of storing customer city and state repeatedly in every order row, store pincode details in a separate postal table and reference it.
What are the types of normalization in DBMS?
The standard types are 1NF, 2NF, 3NF, BCNF, 4NF, and 5NF. Advanced discussions also include DKNF and 6NF, while denormalization is a separate controlled performance technique rather than a normal form.
What is the difference between 3NF and BCNF?
3NF allows a functional dependency X → A if X is a superkey or A is a prime attribute. BCNF is stricter: X must be a superkey for every non-trivial dependency, so every BCNF relation is in 3NF, but every 3NF relation is not necessarily in BCNF.
How do you implement normalization in SQL?
Start by listing dependencies, identifying candidate keys, and finding repeating, partial, transitive, or multivalued dependencies. Then create separate tables with primary keys, foreign keys, unique constraints, check constraints, and junction tables where needed.
When should you use 3NF instead of BCNF?
Use 3NF when dependency preservation is important and the remaining redundancy is acceptable. BCNF gives stronger anomaly removal, but some BCNF decompositions make it harder to enforce original dependencies without joins.
Is normalization always good for performance?
Normalization is excellent for data integrity in OLTP systems, but it can increase joins for read-heavy reports. Performance-critical analytics systems often use controlled denormalization, summary tables, or star schemas after the normalized source model is understood.
What is the most common mistake in normalization?
The most common mistake is splitting tables based on nouns rather than dependencies. Normalization decisions should be based on which attributes determine which other attributes, whether the join is lossless, and whether dependencies can still be enforced.
Can a table be in 2NF but not in 3NF?
Yes. A table can remove partial dependencies and still contain transitive dependencies, such as CustomerId → Pincode and Pincode → City. That table is in 2NF but violates 3NF until the pincode details are separated.
Interview Preparation
Normalization questions test whether you can reason from dependencies rather than memorize definitions. State the normal form, identify the violating dependency, and propose a lossless decomposition with keys.
Conceptual Questions
- Why is normalization needed in DBMS? It reduces redundant storage of the same fact and prevents update, insert, and delete anomalies. The goal is data integrity first, with performance tuned separately when needed.
- What is a partial dependency? A partial dependency occurs when a non-prime attribute depends on only part of a composite candidate key. It violates 2NF and is fixed by moving the partially dependent attributes into another relation.
- What is a transitive dependency? A transitive dependency occurs when a key determines a non-key attribute, and that non-key attribute determines another non-key attribute. It violates 3NF unless the dependent attribute is prime under the formal 3NF condition.
- Why is BCNF stricter than 3NF? BCNF requires every determinant in a non-trivial functional dependency to be a superkey. 3NF permits a dependency when the right-side attribute is prime, so BCNF removes more anomaly cases.
Applied / Problem-Solving Questions
- Given R(A, B, C) with AB as key and B → C, which normal form is violated? B → C is a partial dependency because C depends on part of the composite key AB. The relation violates 2NF and should be decomposed into R1(B, C) and R2(A, B).
- Given R(A, B, C) with A → B and B → C, what issue exists? A determines C through B, so there is a transitive dependency. Decompose into R1(A, B) and R2(B, C) if B is the determinant for C.
- How do you check whether a decomposition is lossless? For binary decomposition, check whether the common attributes functionally determine one of the decomposed relations. If the common attribute is a key in one side, the join is lossless.
- When would you denormalize a normalized schema? Denormalize only after measuring read bottlenecks, such as dashboard queries repeatedly aggregating large transaction tables. Use materialized views, summary tables, or derived columns with refresh rules.
Key Takeaways
Normalization in DBMS with examples becomes practical when you track dependencies carefully: 1NF removes repeating groups, 2NF removes partial dependency, 3NF removes transitive dependency, BCNF requires every determinant to be a superkey, and 4NF or 5NF handles advanced multivalued and join dependencies.
For GATE and interviews, the most tested points are candidate key calculation using closure, highest normal form identification, 3NF versus BCNF, lossless join decomposition, dependency preservation, and recognizing when a multivalued dependency violates 4NF.
The natural next step is Second Normal Form in DBMS, because 2NF is the first place where dependency reasoning becomes essential for composite-key schemas.