Mastering Database Schema: From Normalization to Sharding and Scaling
This comprehensive guide explores essential database design principles—including normalization, denormalization, data partitioning, routing, and scaling techniques—offering practical strategies to optimize schema structures, reduce redundancy, and improve performance for both relational and NoSQL systems.
1. Introduction
In database development, schema design and index optimization are key concerns that affect system architecture and performance.
This article introduces general principles and optimization techniques for database design, including normalization, denormalization, data partitioning, routing, and merging.
2. General Principles of Schema Design
2.1 Overview
Normalization theory is the golden rule for relational database design, providing a theoretical foundation for structuring data and ensuring consistency.
Commonly used normal forms are the first, second, third, and BC (Boyce‑Codd) normal forms, which are reflected in schema design even when developers are unaware of them.
2.2 First and Second Normal Forms
First Normal Form (1NF) requires each field to be atomic and indivisible, preventing complex or multi‑valued attributes that could harm abstraction and consistency.
Second Normal Form (2NF) ensures that each record has a primary‑key identifier, supporting business requirements for unique identification and enabling certain index structures.
2.3 Third Normal Form
Third Normal Form (3NF) eliminates non‑key attributes that could serve as candidate keys for subsets, effectively splitting entities into separate tables and linking them via relationships.
Applying 3NF reduces data redundancy and inconsistency, and most schemas should aim to satisfy it.
2.4 Boyce‑Codd Normal Form
BCNF is a stricter subset of 3NF, requiring that every determinant be a candidate key, which further reduces redundancy, especially when composite primary keys are involved.
3. Denormalization Design
3.1 Data Redundancy Example
A typical user_info table stores extensive profile fields, many of which are unnecessary for login operations, leading to wasted I/O.
Creating a separate login table containing only id, nickname, password, and real name allows login queries to read a minimal record set, improving performance and reducing network traffic.
While this introduces redundancy, it is acceptable in read‑heavy scenarios where the benefits outweigh consistency costs.
3.2 De‑association
Joins combine tables using Cartesian products, which can be costly for large tables; schema design should minimize joins by consolidating fields or accepting some redundancy.
3.3 Removing Consistency Constraints
Traditional relational constraints (foreign keys, uniqueness) add overhead; moving validation to the application layer can reduce database load when strict consistency is not required.
3.4 Reducing SQL Dependence
3.4.1 Underlying Key‑Value Storage
Relational databases ultimately store rows as key‑value pairs; even simple SELECT statements retrieve full rows before filtering needed columns.
3.4.2 MySQL Layered Architecture
MySQL consists of a client layer, a DBMS layer, and a pluggable storage‑engine layer.
3.4.3 Optional Components
Depending on the environment, components such as user authentication, SQL parsing, and access control can be omitted to streamline the system.
3.4.4 NoSQL Storage
By stripping higher‑level features, a relational system can be reduced to a pure key‑value store, resembling NoSQL architectures that offer better performance and horizontal scalability.
4. Data Expansion
4.1 Scale‑Up and Scale‑Out
Scale‑up enhances a single machine’s resources, while scale‑out adds more nodes or shards, often using replication or data partitioning to distribute load.
4.2 Data Partitioning
4.2.1 Why Partition
Single‑instance databases have limits on data volume and TPS; exceeding these limits degrades performance.
4.2.2 Basic Principles
Distribute data evenly across nodes, keep business coupling low, maintain consistent access patterns, and ensure data safety.
4.2.3 Vertical Partitioning
Separate tables by business domain to achieve high cohesion and low coupling.
4.2.4 Horizontal Partitioning (Sharding)
Split a table into multiple identical tables based on a shard key (e.g., user ID), reducing load per table.
Advantages: fixed cost, resolves single‑table bottlenecks, transaction transparency.
Disadvantages: complex routing, limited to a single shard key, join difficulties, and challenges with re‑sharding.
4.2.5 Other Partitioning Methods
Logical partitioning isolates business logic; time‑based partitioning uses creation timestamps; hot‑cold partitioning separates frequently accessed data; volume‑based partitioning splits by table size.
4.3 Data Routing and Merging
After sharding, SQL routing is required. Approaches include modifying application code, altering the database, or using a middleware proxy.
4.4 Scale‑Up with Flash Storage
Upgrading to SSDs eliminates mechanical latency, offering orders‑of‑magnitude higher IOPS compared to HDDs, and can be a simpler performance boost than extensive horizontal scaling.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
