Databases 27 min read

Mastering Database Schema: From Normalization to Sharding Strategies

This article explores core database design concepts, covering normalization principles, denormalization tactics, schema simplification, MySQL architecture, various data partitioning methods—including vertical, horizontal, logical, and time‑based sharding—and practical routing and storage optimizations for high‑performance systems.

ITFLY8 Architecture Home

Nov 18, 2016

Mastering Database Schema: From Normalization to Sharding Strategies

1. Introduction

Database development focuses on two key areas: schema structure design and index optimization, both crucial for system architecture and performance.

The article outlines general principles and optimization techniques such as normal forms, denormalization, data partitioning, routing, and merging.

2. General Principles of Schema Design

2.1 Overview

Normalization theory is the golden rule for relational database design, ensuring data consistency by structuring data according to normal forms.

2.2 First and Second Normal Forms

First Normal Form (1NF) requires all fields to be atomic, preventing complex or multi‑valued attributes that could break abstraction and consistency.

Second Normal Form (2NF) mandates that each record have a primary key, supporting unique identification and enabling index structures.

2.3 Third Normal Form

Third Normal Form (3NF) eliminates non‑key attributes that could serve as candidate keys, encouraging the separation of business entities into distinct tables and reducing redundancy.

2.4 BC Normal Form

BCNF is a stricter subset of 3NF, requiring every determinant to be a candidate key, further minimizing redundancy, especially when composite primary keys are present.

3. Denormalization Design

3.1 Data Redundancy Example

A typical user_info table stores extensive profile data (id, name, gender, age, contact, education, interests, etc.). While this design works for low traffic, high‑volume login operations only need a few fields (id, nickname, password, real name), causing unnecessary reads.

Creating a separate login table with only the required fields reduces read latency and network bandwidth.

3.2 De‑association

Joins are costly, especially on large tables. Schema design should aim to minimize joins by consolidating fields or accepting controlled redundancy.

3.3 Removing Consistency Constraints

In many web applications, strict foreign‑key or uniqueness constraints can be shifted to the application layer to reduce database overhead.

3.4 De‑SQL

3.4.1 Underlying Storage

Relational databases ultimately store data as <k,v> pairs; fields are metadata. Even simple queries retrieve full rows before projecting needed columns.

3.4.2 MySQL Layered Architecture

MySQL consists of a client layer, a DBMS layer, and a pluggable storage‑engine layer.

3.4.3 What Can Be Trimmed

In trusted environments, user authentication, SQL parsing, and access‑control modules can be simplified or removed.

3.4.4 NoSQL Storage

By stripping higher‑level features, a relational system essentially becomes a key‑value store, similar to NoSQL solutions, offering better performance and horizontal scalability.

4. Data Expansion

4.1 Scale‑Up and Scale‑Out

Scale‑up enhances a single node’s resources (CPU, memory, SSD), while scale‑out adds more nodes or shards, often using replication or data partitioning.

4.2 Data Partitioning

4.2.1 Why Partition

Single‑node capacity is limited; large tables degrade performance, so partitioning distributes load.

4.2.2 Basic Principles

Distribute data evenly across nodes, keep business coupling low, ensure balanced access, and maintain consistency and safety.

4.2.3 Vertical Partitioning

Separate tables by business domain to achieve high cohesion and low coupling.

4.2.4 Horizontal Partitioning (Sharding)

Split a table into multiple identical tables based on a shard key (e.g., user_id). Benefits include fixed cost, elimination of single‑table bottlenecks, and transparent transactions. Drawbacks are complex routing, single‑field sharding limits, difficult joins, and costly re‑sharding.

4.2.4.1 Advantages

Fixed cost regardless of shard count.

Solves single‑table bottlenecks.

Transactions remain transparent.

4.2.4.2 Disadvantages

SQL routing becomes complex.

Only one shard key; other queries may require scanning all shards.

Joins across shards are hard.

Second‑level re‑sharding is painful.

4.2.5 Other Partitioning Methods

4.2.5.1 Logical Partitioning

Separate data by business logic to allow tailored indexing and query strategies.

4.2.5.2 Time Partitioning

Store records in different tables based on creation time (e.g., daily or weekly partitions).

4.2.5.3 Hot‑Cold Partitioning

Isolate frequently accessed (“hot”) data from rarely accessed (“cold”) data to improve performance.

4.2.5.4 Volume Partitioning

Split tables by size or row count, often used for log tables.

4.3 Data Routing and Merging

After sharding, SQL routing is required. Three common solutions are:

Modify application code to include routing logic.

Extend the database with plugins or custom modules.

Introduce a middleware proxy that transparently routes queries.

Each approach balances transparency, development effort, and maintenance complexity.

4.4 Scale‑Up with Flash Storage

When hardware upgrades (e.g., SSDs) can meet performance needs, they provide a simpler alternative to complex sharding, dramatically reducing I/O latency compared to mechanical disks.

For teams with limited resources, leveraging flash storage may be a more pragmatic performance boost.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Databases scaling schema design normalization Denormalization

Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.