What are the 5 rules of data normalization?
Data normalization is a fundamental process in relational database design aimed at organizing data to reduce redundancy and improve data integrity. By following normalization rules, you ensure that the database structure is efficient, scalable, and free from common anomalies that can compromise data accuracy. The process involves decomposing tables into smaller, related tables without losing essential information. Below are the five normal forms (1NF to 5NF) that outline the rules of data normalization:
1. First Normal Form (1NF)
Purpose: Eliminate duplicate columns from the same table and create separate tables for each group of related data. Ensure that each column contains atomic (indivisible) values.
Rules:
- Atomicity: Each column must contain only atomic (indivisible) values. No multiple values or lists in a single column.
- Uniqueness: Each column should have a unique name.
- Order Irrelevance: The order in which data is stored does not matter.
- Primary Key: The table must have a primary key that uniquely identifies each row.
Example:
Before 1NF:
EmployeeID | Name | Skills |
---|---|---|
1 | John Doe | SQL, Java, Python |
2 | Jane Smith | HTML, CSS |
After 1NF:
EmployeeID | Name | Skill |
---|---|---|
1 | John Doe | SQL |
1 | John Doe | Java |
1 | John Doe | Python |
2 | Jane Smith | HTML |
2 | Jane Smith | CSS |
2. Second Normal Form (2NF)
Purpose: Remove partial dependencies; ensure that non-key attributes are fully functionally dependent on the primary key.
Rules:
- Compliance with 1NF: The table must first satisfy all the rules of 1NF.
- Full Functional Dependency: All non-key attributes must depend entirely on the primary key, not just part of it. This applies especially to tables with composite primary keys.
Example:
Before 2NF:
OrderID | ProductID | ProductName | Quantity |
---|---|---|---|
1001 | 2001 | Laptop | 2 |
1001 | 2002 | Mouse | 5 |
1002 | 2001 | Laptop | 1 |
Issue: ProductName
depends only on ProductID
, not on the entire primary key (OrderID
, ProductID
).
After 2NF:
Orders Table:
OrderID | ProductID | Quantity |
---|---|---|
1001 | 2001 | 2 |
1001 | 2002 | 5 |
1002 | 2001 | 1 |
Products Table:
ProductID | ProductName |
---|---|
2001 | Laptop |
2002 | Mouse |
3. Third Normal Form (3NF)
Purpose: Eliminate transitive dependencies; ensure that non-key attributes are not dependent on other non-key attributes.
Rules:
- Compliance with 2NF: The table must first satisfy all the rules of 2NF.
- No Transitive Dependency: Non-key attributes should depend only on the primary key and not on other non-key attributes.
Example:
Before 3NF:
EmployeeID | Name | DepartmentID | DepartmentName |
---|---|---|---|
1 | John Doe | D001 | Sales |
2 | Jane Smith | D002 | Marketing |
3 | Alice Johnson | D001 | Sales |
Issue: DepartmentName
depends on DepartmentID
, which is a non-key attribute.
After 3NF:
Employees Table:
EmployeeID | Name | DepartmentID |
---|---|---|
1 | John Doe | D001 |
2 | Jane Smith | D002 |
3 | Alice Johnson | D001 |
Departments Table:
DepartmentID | DepartmentName |
---|---|
D001 | Sales |
D002 | Marketing |
4. Boyce-Codd Normal Form (BCNF)
Purpose: Handle cases where multiple candidate keys exist and ensure that every determinant is a candidate key.
Rules:
- Compliance with 3NF: The table must first satisfy all the rules of 3NF.
- Every Determinant is a Candidate Key: For every functional dependency (A → B), A must be a super key.
Example:
Before BCNF: Consider a table with composite dependencies.
CourseID | Instructor | Room |
---|---|---|
C001 | Dr. Smith | R101 |
C001 | Dr. Smith | R102 |
C002 | Dr. Johnson | R101 |
Issue: If an instructor can teach only one course and a room can host only one instructor, then both Instructor
and Room
determine CourseID
, but neither is a candidate key.
After BCNF:
Courses Table:
CourseID | Instructor |
---|---|
C001 | Dr. Smith |
C002 | Dr. Johnson |
Rooms Table:
Room | CourseID |
---|---|
R101 | C002 |
R102 | C001 |
5. Fourth Normal Form (4NF)
Purpose: Eliminate multi-valued dependencies; ensure that no table contains two or more independent multi-valued facts about the same entity.
Rules:
- Compliance with BCNF: The table must first satisfy all the rules of BCNF.
- No Multi-Valued Dependencies: A table should not have more than one independent multi-valued dependency.
Example:
Before 4NF: Consider a table where an employee can have multiple skills and multiple hobbies.
EmployeeID | Skill | Hobby |
---|---|---|
1 | SQL | Golf |
1 | Java | Chess |
2 | HTML | Painting |
Issue: Skills and hobbies are independent of each other, leading to redundant combinations.
After 4NF:
EmployeeSkills Table:
EmployeeID | Skill |
---|---|
1 | SQL |
1 | Java |
2 | HTML |
EmployeeHobbies Table:
EmployeeID | Hobby |
---|---|
1 | Golf |
1 | Chess |
2 | Painting |
5. Fifth Normal Form (5NF)
Purpose: Decompose tables to eliminate join dependencies and ensure that the table can be reconstructed from smaller tables without loss of data.
Rules:
- Compliance with 4NF: The table must first satisfy all the rules of 4NF.
- No Join Dependencies: The table should not have any join dependencies that are not implied by the candidate keys.
Example:
Before 5NF: Consider a table that records which suppliers supply which parts for specific projects.
Supplier | Part | Project |
---|---|---|
S1 | P1 | PR1 |
S1 | P2 | PR1 |
S1 | P1 | PR2 |
S2 | P1 | PR1 |
Issue: The relationships between suppliers, parts, and projects can lead to complex dependencies that are not captured by candidate keys alone.
After 5NF:
SupplierParts Table:
Supplier | Part |
---|---|
S1 | P1 |
S1 | P2 |
S2 | P1 |
PartProjects Table:
Part | Project |
---|---|
P1 | PR1 |
P1 | PR2 |
P2 | PR1 |
SupplierProjects Table:
Supplier | Project |
---|---|
S1 | PR1 |
S1 | PR2 |
S2 | PR1 |
By decomposing the original table into these three tables, you eliminate join dependencies, ensuring that the original table can be accurately reconstructed without redundancy.
Conclusion
Data normalization through these five normal forms ensures a well-structured, efficient, and reliable database. By systematically applying these rules:
- First Normal Form (1NF): Ensures atomicity and eliminates duplicate columns.
- Second Normal Form (2NF): Removes partial dependencies, ensuring full functional dependency on the primary key.
- Third Normal Form (3NF): Eliminates transitive dependencies, ensuring that non-key attributes depend only on the primary key.
- Boyce-Codd Normal Form (BCNF): Strengthens 3NF by ensuring every determinant is a candidate key.
- Fourth Normal Form (4NF): Removes multi-valued dependencies, preventing redundant data.
- Fifth Normal Form (5NF): Addresses join dependencies, ensuring data can be reconstructed without loss.
While achieving up to 3NF is often sufficient for most practical applications, understanding and applying higher normal forms like BCNF, 4NF, and 5NF can further enhance database integrity and performance, especially in complex systems.
Key Takeaways:
- Normalization Reduces Redundancy: By organizing data efficiently, normalization minimizes duplicate data, saving storage space and reducing inconsistencies.
- Enhances Data Integrity: Ensuring that data dependencies are logical and based on keys maintains the accuracy and reliability of the database.
- Improves Query Performance: Well-normalized databases often perform better in queries due to optimized table structures and relationships.
- Facilitates Maintenance: Simplified and organized data structures make database maintenance, updates, and scalability more manageable.
By adhering to these normalization principles, you create robust databases that are easier to manage, less prone to errors, and better suited to support the evolving needs of applications and businesses.
GET YOUR FREE
Coding Questions Catalog