Databases 27 min read

Mastering Database Schema Design: From Normalization to Sharding and Scaling

This article explains essential database design principles—including normalization, denormalization, join avoidance, and various sharding techniques—while also covering scaling strategies such as vertical upgrades, horizontal partitioning, and the use of flash storage to boost performance.

ITPUB
ITPUB
ITPUB
Mastering Database Schema Design: From Normalization to Sharding and Scaling

1. Schema Design Principles

Relational database design relies on normal forms to guarantee data consistency and minimize redundancy.

1.1 First Normal Form (1NF)

All column values must be atomic; multi‑valued or composite attributes are prohibited.

1.2 Second Normal Form (2NF)

Every non‑key attribute must be fully dependent on the whole primary key, ensuring a unique identifier for each row.

1.3 Third Normal Form (3NF)

No non‑key attribute may transitively depend on the primary key; each business entity’s attributes are stored in separate tables.

1.4 Boyce‑Codd Normal Form (BCNF)

Every determinant must be a candidate key, providing a stricter reduction of redundancy, especially when composite keys are involved.

2. Denormalization Techniques

2.1 Reducing Data Redundancy for Login

A typical user_info table stores extensive user profile data. For login operations only id, nickname, password and real_name are required. Splitting these columns into a dedicated login table dramatically reduces I/O and network traffic.

Table: user_info (≈1,000,000 rows)
- id BIGINT(20) NOT NULL
- name VARCHAR(32) NOT NULL
- gender TINYINT(4) NOT NULL
- age INT(8) NOT NULL
- tel VARCHAR(16) NULL
- email VARCHAR(64) NULL
- school VARCHAR(32) NULL
- company VARCHAR(32) NULL
- interest VARCHAR(512) NULL
- gmt_create DATETIME NOT NULL
- gmt_modified DATETIME NOT NULL
Table: login (≈1,000,000 rows)
- id BIGINT(20) NOT NULL
- nickname VARCHAR(32) NOT NULL
- password VARCHAR(128) NOT NULL   -- stored as MD5 hash
- real_name VARCHAR(32) NOT NULL

Read‑only login queries now touch only the slim login table, cutting row size and bandwidth.

2.2 Avoiding Expensive Joins

MySQL implements joins as Nested Loop Joins; other engines may use Hash Join or Sort‑Merge Join. All join algorithms become costly when tables grow large because each join requires scanning full rows. Designing schemas to minimise joins—by merging frequently accessed columns or accepting controlled redundancy—improves performance.

2.3 Moving Consistency Checks to the Application Layer

Foreign‑key and unique constraints add overhead to every INSERT/UPDATE. In trusted environments they can be omitted from the DBMS and enforced in application code, reducing latency while preserving logical correctness.

2.4 Minimal SQL Layer

At the storage engine level relational databases store data as <k,v> pairs. A query such as SELECT user_id, user_name FROM user_info WHERE age = 8; forces the engine to fetch full rows that satisfy the predicate before projecting the requested columns. By stripping higher‑level features—authentication, SQL parsing, complex access control—only essential indexing, transaction and locking remain, effectively turning the system into a pure key‑value store with NoSQL‑like performance.

3. Scaling Strategies

3.1 Scale‑Up vs. Scale‑Out

Scale‑up adds resources (CPU, memory, SSD, network) to a single server. It is simple to implement and does not require data migration.

Scale‑out adds more database instances or nodes. It typically involves replication or data partitioning (sharding) to distribute load horizontally.

3.2 Data Sharding

Why shard? A single MySQL instance reaches capacity limits around tens of terabytes and a few thousand TPS. Sharding spreads data across multiple nodes, preserving throughput.

Low business coupling between nodes

Consistent business type per node

Balanced data volume and access frequency

Maintain consistency and safety guarantees

Vertical sharding separates tables or databases by business function (e.g., user profile vs. order data), reducing cross‑table joins and simplifying schema evolution.

Horizontal sharding (partitioning) splits a single logical table into many identical tables based on a sharding key such as user_id or order_id. Advantages include fixed cost, elimination of single‑table bottlenecks, and transparent transaction handling. Drawbacks are routing complexity, limited query flexibility (only the sharding key can be used efficiently), difficulty with joins across shards, and costly re‑sharding.

Other variants include:

Logical sharding – isolates business logic rather than pure data volume.

Time‑based partitioning – stores each time period (day, week, month) in a separate table.

Hot‑cold sharding – separates frequently accessed (“hot”) rows from archival (“cold”) rows.

Volume‑based sharding – splits tables once a row‑count threshold is reached, common for log tables.

3.3 Routing After Sharding

When data is distributed, the application must locate the correct shard. Common approaches:

Embed routing logic directly in application code.

Extend the database with plugins or middleware that perform transparent routing.

Deploy a middle‑layer proxy that intercepts SQL statements and forwards them to the appropriate shard without modifying the application.

Each method trades development effort, transparency, and operational complexity.

3.4 Flash Storage for Scale‑Up

Replacing mechanical disks with SSDs reduces average seek latency from ~5 ms to <1 ms and raises IOPS from ~100 to several thousand, delivering a substantial performance boost with minimal architectural changes.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

SQLshardingDatabase designscalingnormalizationDenormalization
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.