Mastering Database Schema: From Normalization to Sharding Strategies
This article explores core database design concepts, covering normalization principles, denormalization tactics, schema simplification, MySQL architecture, various data partitioning methods—including vertical, horizontal, logical, and time‑based sharding—and practical routing and storage optimizations for high‑performance systems.
1. Introduction
Database development focuses on two key areas: schema structure design and index optimization, both crucial for system architecture and performance.
The article outlines general principles and optimization techniques such as normal forms, denormalization, data partitioning, routing, and merging.
2. General Principles of Schema Design
2.1 Overview
Normalization theory is the golden rule for relational database design, ensuring data consistency by structuring data according to normal forms.
2.2 First and Second Normal Forms
First Normal Form (1NF) requires all fields to be atomic, preventing complex or multi‑valued attributes that could break abstraction and consistency.
Second Normal Form (2NF) mandates that each record have a primary key, supporting unique identification and enabling index structures.
2.3 Third Normal Form
Third Normal Form (3NF) eliminates non‑key attributes that could serve as candidate keys, encouraging the separation of business entities into distinct tables and reducing redundancy.
2.4 BC Normal Form
BCNF is a stricter subset of 3NF, requiring every determinant to be a candidate key, further minimizing redundancy, especially when composite primary keys are present.
3. Denormalization Design
3.1 Data Redundancy Example
A typical user_info table stores extensive profile data (id, name, gender, age, contact, education, interests, etc.). While this design works for low traffic, high‑volume login operations only need a few fields (id, nickname, password, real name), causing unnecessary reads.
Creating a separate login table with only the required fields reduces read latency and network bandwidth.
3.2 De‑association
Joins are costly, especially on large tables. Schema design should aim to minimize joins by consolidating fields or accepting controlled redundancy.
3.3 Removing Consistency Constraints
In many web applications, strict foreign‑key or uniqueness constraints can be shifted to the application layer to reduce database overhead.
3.4 De‑SQL
3.4.1 Underlying Storage
Relational databases ultimately store data as <k,v> pairs; fields are metadata. Even simple queries retrieve full rows before projecting needed columns.
3.4.2 MySQL Layered Architecture
MySQL consists of a client layer, a DBMS layer, and a pluggable storage‑engine layer.
3.4.3 What Can Be Trimmed
In trusted environments, user authentication, SQL parsing, and access‑control modules can be simplified or removed.
3.4.4 NoSQL Storage
By stripping higher‑level features, a relational system essentially becomes a key‑value store, similar to NoSQL solutions, offering better performance and horizontal scalability.
4. Data Expansion
4.1 Scale‑Up and Scale‑Out
Scale‑up enhances a single node’s resources (CPU, memory, SSD), while scale‑out adds more nodes or shards, often using replication or data partitioning.
4.2 Data Partitioning
4.2.1 Why Partition
Single‑node capacity is limited; large tables degrade performance, so partitioning distributes load.
4.2.2 Basic Principles
Distribute data evenly across nodes, keep business coupling low, ensure balanced access, and maintain consistency and safety.
4.2.3 Vertical Partitioning
Separate tables by business domain to achieve high cohesion and low coupling.
4.2.4 Horizontal Partitioning (Sharding)
Split a table into multiple identical tables based on a shard key (e.g., user_id). Benefits include fixed cost, elimination of single‑table bottlenecks, and transparent transactions. Drawbacks are complex routing, single‑field sharding limits, difficult joins, and costly re‑sharding.
4.2.4.1 Advantages
Fixed cost regardless of shard count.
Solves single‑table bottlenecks.
Transactions remain transparent.
4.2.4.2 Disadvantages
SQL routing becomes complex.
Only one shard key; other queries may require scanning all shards.
Joins across shards are hard.
Second‑level re‑sharding is painful.
4.2.5 Other Partitioning Methods
4.2.5.1 Logical Partitioning
Separate data by business logic to allow tailored indexing and query strategies.
4.2.5.2 Time Partitioning
Store records in different tables based on creation time (e.g., daily or weekly partitions).
4.2.5.3 Hot‑Cold Partitioning
Isolate frequently accessed (“hot”) data from rarely accessed (“cold”) data to improve performance.
4.2.5.4 Volume Partitioning
Split tables by size or row count, often used for log tables.
4.3 Data Routing and Merging
After sharding, SQL routing is required. Three common solutions are:
Modify application code to include routing logic.
Extend the database with plugins or custom modules.
Introduce a middleware proxy that transparently routes queries.
Each approach balances transparency, development effort, and maintenance complexity.
4.4 Scale‑Up with Flash Storage
When hardware upgrades (e.g., SSDs) can meet performance needs, they provide a simpler alternative to complex sharding, dramatically reducing I/O latency compared to mechanical disks.
For teams with limited resources, leveraging flash storage may be a more pragmatic performance boost.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
