Mastering Sharding: How to Ace Interview Questions on Database Partitioning and Scaling
This article explains why interviewers probe sharding, walks through the typical chain of questions, compares partition tables and read/write splitting, shows how to decide between sharding strategies, estimate capacity, plan expansion, and safely migrate data with traffic replication.
Why interviewers ask about sharding
For senior backend architects, interviewers often ask "Do you have sharding? How did you split? What scale?" to test systematic thinking and forward‑looking architectural judgment rather than just experience.
Typical interview question chain
Why do you need sharding? Can partition tables or read‑only replicas solve the problem?
When did you decide to shard? What threshold triggered the decision?
Did you split databases, tables, or both? What were the criteria?
How many databases/tables did you end up with? How was the number calculated?
What is the fallback plan when capacity runs out?
Answering confidently requires a solid grasp of the system’s data volume, TPS/QPS, and growth trends.
Understanding partition tables
In MySQL a partitioned table is still a logical table but its data is stored in separate physical files. Example: a monthly‑partitioned order table creates db_2025_01, db_2025_02, …
Performance boost : queries that hit a single partition scan only that partition.
Reduced lock contention : writes to different partitions do not compete.
Easy data management : dropping old partitions with DROP PARTITION is far faster than DELETE.
Limitations include extra management overhead, poor performance for cross‑partition queries, and inability to use foreign keys.
Read/Write splitting
Adding read‑only replicas offloads read traffic but does not solve write‑side bottlenecks. When the primary’s write TPS, CPU, or network hits a ceiling, read replicas become ineffective.
When sharding becomes necessary
If both partition tables and read/write splitting fail, the only remaining solution is sharding (splitting databases, tables, or both) to achieve horizontal scaling of compute and storage.
Choosing the right sharding approach
Only table sharding : suitable when the bottleneck is a single huge table (large index, slow queries) while the instance’s hardware is still adequate.
Only database sharding : appropriate when the instance’s CPU, I/O, or network is saturated.
Both database and table sharding : typical for large‑scale internet services; e.g., 8 databases each with 128 tables (8 × 128 = 1024 tables).
Capacity estimation
Estimate active data (hot data) and growth trend. For an e‑commerce system, archive years‑old orders, then size the hot dataset for the next 3‑5 years, adding a safety buffer.
Estimating growth
Look at historical data growth (first‑order derivative).
Consider business plans (second‑order derivative) such as user‑base doubling or major marketing campaigns.
Use the formula
future_hot_data = current_hot_data × growth_factor × buffer.
Why sharding counts are powers of two
Bit‑wise hash calculation ( hash(value) & (2^n‑1)) is faster than modulo.
Doubling the shard count later requires minimal data movement.
Example: start with 2 tables ( id % 2), later expand to 4 tables ( id % 4) and migrate half the rows accordingly.
Elegant expansion planning
When capacity planning is uncertain, adopt a “over‑provision” strategy and round the shard count up to the nearest power of two.
Data migration steps
Dual‑write : application writes to both old and new shards.
Bulk migration : scripts copy historical data to new shards according to the new sharding rule.
Data verification : ensure new and old data are identical.
Traffic cut‑over : gradually shift reads and writes to the new shards.
The verification step is the hardest; tools like tcpcopy or goreplay can help.
Traffic replication validation
During dual‑write, replicate each read request to the new shard asynchronously and compare the responses. A mismatch triggers an alarm. This method validates consistency at the business‑logic level.
HTTPS traffic can be duplicated after TLS termination at the gateway, avoiding encryption issues.
Handling concurrency pitfalls
Concurrent writes between the original read and the replayed read can cause false‑positive mismatches. Sampling a small percentage of traffic reduces overhead while still providing confidence.
This decision logic mirrors the earlier analysis of when to shard versus when to only split tables; shrinking shards is theoretically possible but rarely done in practice.
Key takeaways
Effective sharding decisions stem from thorough capacity assessment, clear identification of the primary bottleneck, and forward‑looking growth forecasts. Proper planning, using powers‑of‑two shard counts, and systematic migration with dual‑write and traffic‑replication validation turn sharding from a firefighting measure into a sustainable scaling strategy.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
