What are the 5 rules of data normalization?

Free Coding Questions Catalog
Boost your coding skills with our essential coding questions catalog. Take a step towards a better tech career now!

Data normalization is a fundamental process in relational database design aimed at organizing data to reduce redundancy and improve data integrity. By following normalization rules, you ensure that the database structure is efficient, scalable, and free from common anomalies that can compromise data accuracy. The process involves decomposing tables into smaller, related tables without losing essential information. Below are the five normal forms (1NF to 5NF) that outline the rules of data normalization:

1. First Normal Form (1NF)

Purpose: Eliminate duplicate columns from the same table and create separate tables for each group of related data. Ensure that each column contains atomic (indivisible) values.

Rules:

  • Atomicity: Each column must contain only atomic (indivisible) values. No multiple values or lists in a single column.
  • Uniqueness: Each column should have a unique name.
  • Order Irrelevance: The order in which data is stored does not matter.
  • Primary Key: The table must have a primary key that uniquely identifies each row.

Example:

Before 1NF:

EmployeeIDNameSkills
1John DoeSQL, Java, Python
2Jane SmithHTML, CSS

After 1NF:

EmployeeIDNameSkill
1John DoeSQL
1John DoeJava
1John DoePython
2Jane SmithHTML
2Jane SmithCSS

2. Second Normal Form (2NF)

Purpose: Remove partial dependencies; ensure that non-key attributes are fully functionally dependent on the primary key.

Rules:

  • Compliance with 1NF: The table must first satisfy all the rules of 1NF.
  • Full Functional Dependency: All non-key attributes must depend entirely on the primary key, not just part of it. This applies especially to tables with composite primary keys.

Example:

Before 2NF:

OrderIDProductIDProductNameQuantity
10012001Laptop2
10012002Mouse5
10022001Laptop1

Issue: ProductName depends only on ProductID, not on the entire primary key (OrderID, ProductID).

After 2NF:

Orders Table:

OrderIDProductIDQuantity
100120012
100120025
100220011

Products Table:

ProductIDProductName
2001Laptop
2002Mouse

3. Third Normal Form (3NF)

Purpose: Eliminate transitive dependencies; ensure that non-key attributes are not dependent on other non-key attributes.

Rules:

  • Compliance with 2NF: The table must first satisfy all the rules of 2NF.
  • No Transitive Dependency: Non-key attributes should depend only on the primary key and not on other non-key attributes.

Example:

Before 3NF:

EmployeeIDNameDepartmentIDDepartmentName
1John DoeD001Sales
2Jane SmithD002Marketing
3Alice JohnsonD001Sales

Issue: DepartmentName depends on DepartmentID, which is a non-key attribute.

After 3NF:

Employees Table:

EmployeeIDNameDepartmentID
1John DoeD001
2Jane SmithD002
3Alice JohnsonD001

Departments Table:

DepartmentIDDepartmentName
D001Sales
D002Marketing

4. Boyce-Codd Normal Form (BCNF)

Purpose: Handle cases where multiple candidate keys exist and ensure that every determinant is a candidate key.

Rules:

  • Compliance with 3NF: The table must first satisfy all the rules of 3NF.
  • Every Determinant is a Candidate Key: For every functional dependency (A → B), A must be a super key.

Example:

Before BCNF: Consider a table with composite dependencies.

CourseIDInstructorRoom
C001Dr. SmithR101
C001Dr. SmithR102
C002Dr. JohnsonR101

Issue: If an instructor can teach only one course and a room can host only one instructor, then both Instructor and Room determine CourseID, but neither is a candidate key.

After BCNF:

Courses Table:

CourseIDInstructor
C001Dr. Smith
C002Dr. Johnson

Rooms Table:

RoomCourseID
R101C002
R102C001

5. Fourth Normal Form (4NF)

Purpose: Eliminate multi-valued dependencies; ensure that no table contains two or more independent multi-valued facts about the same entity.

Rules:

  • Compliance with BCNF: The table must first satisfy all the rules of BCNF.
  • No Multi-Valued Dependencies: A table should not have more than one independent multi-valued dependency.

Example:

Before 4NF: Consider a table where an employee can have multiple skills and multiple hobbies.

EmployeeIDSkillHobby
1SQLGolf
1JavaChess
2HTMLPainting

Issue: Skills and hobbies are independent of each other, leading to redundant combinations.

After 4NF:

EmployeeSkills Table:

EmployeeIDSkill
1SQL
1Java
2HTML

EmployeeHobbies Table:

EmployeeIDHobby
1Golf
1Chess
2Painting

5. Fifth Normal Form (5NF)

Purpose: Decompose tables to eliminate join dependencies and ensure that the table can be reconstructed from smaller tables without loss of data.

Rules:

  • Compliance with 4NF: The table must first satisfy all the rules of 4NF.
  • No Join Dependencies: The table should not have any join dependencies that are not implied by the candidate keys.

Example:

Before 5NF: Consider a table that records which suppliers supply which parts for specific projects.

SupplierPartProject
S1P1PR1
S1P2PR1
S1P1PR2
S2P1PR1

Issue: The relationships between suppliers, parts, and projects can lead to complex dependencies that are not captured by candidate keys alone.

After 5NF:

SupplierParts Table:

SupplierPart
S1P1
S1P2
S2P1

PartProjects Table:

PartProject
P1PR1
P1PR2
P2PR1

SupplierProjects Table:

SupplierProject
S1PR1
S1PR2
S2PR1

By decomposing the original table into these three tables, you eliminate join dependencies, ensuring that the original table can be accurately reconstructed without redundancy.

Conclusion

Data normalization through these five normal forms ensures a well-structured, efficient, and reliable database. By systematically applying these rules:

  1. First Normal Form (1NF): Ensures atomicity and eliminates duplicate columns.
  2. Second Normal Form (2NF): Removes partial dependencies, ensuring full functional dependency on the primary key.
  3. Third Normal Form (3NF): Eliminates transitive dependencies, ensuring that non-key attributes depend only on the primary key.
  4. Boyce-Codd Normal Form (BCNF): Strengthens 3NF by ensuring every determinant is a candidate key.
  5. Fourth Normal Form (4NF): Removes multi-valued dependencies, preventing redundant data.
  6. Fifth Normal Form (5NF): Addresses join dependencies, ensuring data can be reconstructed without loss.

While achieving up to 3NF is often sufficient for most practical applications, understanding and applying higher normal forms like BCNF, 4NF, and 5NF can further enhance database integrity and performance, especially in complex systems.

Key Takeaways:

  • Normalization Reduces Redundancy: By organizing data efficiently, normalization minimizes duplicate data, saving storage space and reducing inconsistencies.
  • Enhances Data Integrity: Ensuring that data dependencies are logical and based on keys maintains the accuracy and reliability of the database.
  • Improves Query Performance: Well-normalized databases often perform better in queries due to optimized table structures and relationships.
  • Facilitates Maintenance: Simplified and organized data structures make database maintenance, updates, and scalability more manageable.

By adhering to these normalization principles, you create robust databases that are easier to manage, less prone to errors, and better suited to support the evolving needs of applications and businesses.

TAGS
Coding Interview
System Design Interview
CONTRIBUTOR
Design Gurus Team

GET YOUR FREE

Coding Questions Catalog

Design Gurus Newsletter - Latest from our Blog
Boost your coding skills with our essential coding questions catalog.
Take a step towards a better tech career now!
Explore Answers
How to crack an interview at IBM?
How do you practice behavioral interview?
What are the 7 phases of SDLC?
Related Courses
Image
Grokking the Coding Interview: Patterns for Coding Questions
Image
Grokking Data Structures & Algorithms for Coding Interviews
Image
Grokking Advanced Coding Patterns for Interviews
Image
One-Stop Portal For Tech Interviews.
Copyright © 2024 Designgurus, Inc. All rights reserved.