Databases 27 min read

Mastering Database Schema: From Normalization to Sharding and Scaling

This comprehensive guide explores essential database design principles—including normalization, denormalization, data partitioning, routing, and scaling techniques—offering practical strategies to optimize schema structures, reduce redundancy, and improve performance for both relational and NoSQL systems.

ITFLY8 Architecture Home

Aug 5, 2017

Mastering Database Schema: From Normalization to Sharding and Scaling

1. Introduction

In database development, schema design and index optimization are key concerns that affect system architecture and performance.

This article introduces general principles and optimization techniques for database design, including normalization, denormalization, data partitioning, routing, and merging.

2. General Principles of Schema Design

2.1 Overview

Normalization theory is the golden rule for relational database design, providing a theoretical foundation for structuring data and ensuring consistency.

Commonly used normal forms are the first, second, third, and BC (Boyce‑Codd) normal forms, which are reflected in schema design even when developers are unaware of them.

2.2 First and Second Normal Forms

First Normal Form (1NF) requires each field to be atomic and indivisible, preventing complex or multi‑valued attributes that could harm abstraction and consistency.

Second Normal Form (2NF) ensures that each record has a primary‑key identifier, supporting business requirements for unique identification and enabling certain index structures.

2.3 Third Normal Form

Third Normal Form (3NF) eliminates non‑key attributes that could serve as candidate keys for subsets, effectively splitting entities into separate tables and linking them via relationships.

Applying 3NF reduces data redundancy and inconsistency, and most schemas should aim to satisfy it.

2.4 Boyce‑Codd Normal Form

BCNF is a stricter subset of 3NF, requiring that every determinant be a candidate key, which further reduces redundancy, especially when composite primary keys are involved.

3. Denormalization Design

3.1 Data Redundancy Example

A typical user_info table stores extensive profile fields, many of which are unnecessary for login operations, leading to wasted I/O.

Creating a separate login table containing only id, nickname, password, and real name allows login queries to read a minimal record set, improving performance and reducing network traffic.

While this introduces redundancy, it is acceptable in read‑heavy scenarios where the benefits outweigh consistency costs.

3.2 De‑association

Joins combine tables using Cartesian products, which can be costly for large tables; schema design should minimize joins by consolidating fields or accepting some redundancy.

3.3 Removing Consistency Constraints

Traditional relational constraints (foreign keys, uniqueness) add overhead; moving validation to the application layer can reduce database load when strict consistency is not required.

3.4 Reducing SQL Dependence

3.4.1 Underlying Key‑Value Storage

Relational databases ultimately store rows as key‑value pairs; even simple SELECT statements retrieve full rows before filtering needed columns.

3.4.2 MySQL Layered Architecture

MySQL consists of a client layer, a DBMS layer, and a pluggable storage‑engine layer.

3.4.3 Optional Components

Depending on the environment, components such as user authentication, SQL parsing, and access control can be omitted to streamline the system.

3.4.4 NoSQL Storage

By stripping higher‑level features, a relational system can be reduced to a pure key‑value store, resembling NoSQL architectures that offer better performance and horizontal scalability.

4. Data Expansion

4.1 Scale‑Up and Scale‑Out

Scale‑up enhances a single machine’s resources, while scale‑out adds more nodes or shards, often using replication or data partitioning to distribute load.

4.2 Data Partitioning

4.2.1 Why Partition

Single‑instance databases have limits on data volume and TPS; exceeding these limits degrades performance.

4.2.2 Basic Principles

Distribute data evenly across nodes, keep business coupling low, maintain consistent access patterns, and ensure data safety.

4.2.3 Vertical Partitioning

Separate tables by business domain to achieve high cohesion and low coupling.

4.2.4 Horizontal Partitioning (Sharding)

Split a table into multiple identical tables based on a shard key (e.g., user ID), reducing load per table.

Advantages: fixed cost, resolves single‑table bottlenecks, transaction transparency.

Disadvantages: complex routing, limited to a single shard key, join difficulties, and challenges with re‑sharding.

4.2.5 Other Partitioning Methods

Logical partitioning isolates business logic; time‑based partitioning uses creation timestamps; hot‑cold partitioning separates frequently accessed data; volume‑based partitioning splits by table size.

4.3 Data Routing and Merging

After sharding, SQL routing is required. Approaches include modifying application code, altering the database, or using a middleware proxy.

4.4 Scale‑Up with Flash Storage

Upgrading to SSDs eliminates mechanical latency, offering orders‑of‑magnitude higher IOPS compared to HDDs, and can be a simpler performance boost than extensive horizontal scaling.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Database Design scaling normalization Denormalization Schema Optimization

Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.