From Chaos to Order: Step‑by‑Step Database Normalization up to 4NF
This article explains the purpose of database normalization, outlines its benefits and drawbacks, and walks through a concrete demo that transforms an unnormalized employee table through the first, second, third, BC, and fourth normal forms, illustrating each step with diagrams.
Introduction
Database normalization has an ambiguous status in practice; textbooks provide formal definitions but real‑world adoption is limited. This article uses simple language and a demo database to transform a non‑normalized table step‑by‑step from the first normal form to the fourth.
Goals of Normalization
Applying normalization brings several benefits, the most important being the reduction of data redundancy, the elimination of anomalies (insert, update, delete), and a more harmonious organization of data. However, normalization also has drawbacks, which are discussed later.
What is Normalization
Normalization is a set of standards that eliminate duplicate data, reduce redundancy, and organize data more efficiently. Satisfying a higher normal form requires satisfying all lower normal forms (e.g., meeting 2NF implies meeting 1NF).
Demo
We start with an unnormalized table containing columns such as employeeId, departmentName, job, jobDescription, skill, departmentDescription, and address.
First Normal Form (1NF)
If every attribute of a relation is an indivisible basic data item, the relation is in 1NF. In simple terms, each attribute must be atomic. In the example, the address field can be split into separate components, so we decompose it into a new table.
Second Normal Form (2NF)
A relation in 1NF is in 2NF when every non‑key attribute is fully functionally dependent on the whole primary key. In the example, departmentDescription depends only on departmentName, not on the full composite key, so we move it to a separate table.
Third Normal Form (3NF)
A relation is in 3NF when there are no transitive dependencies of non‑key attributes on the primary key. In the 2NF version, jobDescription depends on job, creating a transitive dependency, so we separate jobDescription into its own table.
Boyce‑Codd Normal Form (BCNF)
BCNF is a stricter version of 3NF: for every functional dependency X→Y, X must be a superkey. In the 3NF design, the email attribute is unique for each employee, making the table violate BCNF. We move email to a separate table to satisfy BCNF.
Fourth Normal Form (4NF)
4NF eliminates non‑trivial multivalued dependencies. In the BCNF design, the skill attribute holds multiple values (e.g., "C#, SQL, JavaScript"). To achieve 4NF we create a separate skill table that links employees to individual skill entries.
Summary
Higher normal forms increase the number of tables, which can raise query complexity and reduce performance due to more joins. Since storage costs are now negligible, data redundancy is less of a justification for strict normalization. In most cases, achieving 3NF provides sufficient reduction of redundancy and anomalies; 2NF may be acceptable in certain scenarios.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
