Databases 19 min read

Mastering Sharding: How to Ace Interview Questions on Database Partitioning and Scaling

This article explains why interviewers probe sharding, walks through the typical chain of questions, compares partition tables and read/write splitting, shows how to decide between sharding strategies, estimate capacity, plan expansion, and safely migrate data with traffic replication.

Su San Talks Tech
Su San Talks Tech
Su San Talks Tech
Mastering Sharding: How to Ace Interview Questions on Database Partitioning and Scaling

Why interviewers ask about sharding

For senior backend architects, interviewers often ask "Do you have sharding? How did you split? What scale?" to test systematic thinking and forward‑looking architectural judgment rather than just experience.

Typical interview question chain

Why do you need sharding? Can partition tables or read‑only replicas solve the problem?

When did you decide to shard? What threshold triggered the decision?

Did you split databases, tables, or both? What were the criteria?

How many databases/tables did you end up with? How was the number calculated?

What is the fallback plan when capacity runs out?

Answering confidently requires a solid grasp of the system’s data volume, TPS/QPS, and growth trends.

Understanding partition tables

In MySQL a partitioned table is still a logical table but its data is stored in separate physical files. Example: a monthly‑partitioned order table creates db_2025_01, db_2025_02, …

Performance boost : queries that hit a single partition scan only that partition.

Reduced lock contention : writes to different partitions do not compete.

Easy data management : dropping old partitions with DROP PARTITION is far faster than DELETE.

Limitations include extra management overhead, poor performance for cross‑partition queries, and inability to use foreign keys.

Read/Write splitting

Adding read‑only replicas offloads read traffic but does not solve write‑side bottlenecks. When the primary’s write TPS, CPU, or network hits a ceiling, read replicas become ineffective.

When sharding becomes necessary

If both partition tables and read/write splitting fail, the only remaining solution is sharding (splitting databases, tables, or both) to achieve horizontal scaling of compute and storage.

Choosing the right sharding approach

Only table sharding : suitable when the bottleneck is a single huge table (large index, slow queries) while the instance’s hardware is still adequate.

Only database sharding : appropriate when the instance’s CPU, I/O, or network is saturated.

Both database and table sharding : typical for large‑scale internet services; e.g., 8 databases each with 128 tables (8 × 128 = 1024 tables).

Capacity estimation

Estimate active data (hot data) and growth trend. For an e‑commerce system, archive years‑old orders, then size the hot dataset for the next 3‑5 years, adding a safety buffer.

Estimating growth

Look at historical data growth (first‑order derivative).

Consider business plans (second‑order derivative) such as user‑base doubling or major marketing campaigns.

Use the formula

future_hot_data = current_hot_data × growth_factor × buffer

.

Why sharding counts are powers of two

Bit‑wise hash calculation ( hash(value) & (2^n‑1)) is faster than modulo.

Doubling the shard count later requires minimal data movement.

Example: start with 2 tables ( id % 2), later expand to 4 tables ( id % 4) and migrate half the rows accordingly.

Elegant expansion planning

When capacity planning is uncertain, adopt a “over‑provision” strategy and round the shard count up to the nearest power of two.

Data migration steps

Dual‑write : application writes to both old and new shards.

Bulk migration : scripts copy historical data to new shards according to the new sharding rule.

Data verification : ensure new and old data are identical.

Traffic cut‑over : gradually shift reads and writes to the new shards.

The verification step is the hardest; tools like tcpcopy or goreplay can help.

Traffic replication validation

During dual‑write, replicate each read request to the new shard asynchronously and compare the responses. A mismatch triggers an alarm. This method validates consistency at the business‑logic level.

HTTPS traffic can be duplicated after TLS termination at the gateway, avoiding encryption issues.

Handling concurrency pitfalls

Concurrent writes between the original read and the replayed read can cause false‑positive mismatches. Sampling a small percentage of traffic reduces overhead while still providing confidence.

This decision logic mirrors the earlier analysis of when to shard versus when to only split tables; shrinking shards is theoretically possible but rarely done in practice.

Key takeaways

Effective sharding decisions stem from thorough capacity assessment, clear identification of the primary bottleneck, and forward‑looking growth forecasts. Proper planning, using powers‑of‑two shard counts, and systematic migration with dual‑write and traffic‑replication validation turn sharding from a firefighting measure into a sustainable scaling strategy.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

data migrationshardingcapacity planningRead/Write Splittinghorizontal scalingdatabase partitioning
Su San Talks Tech
Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.