Designing a Million‑QPS Database Architecture: Sharding, Caching, and High Availability
This article explains how to architect a database system that can sustain tens of millions of queries per second by combining sharding, read‑write separation, multi‑layer caching, traffic shaping, and robust high‑availability strategies to keep most requests off the database and ensure reliable data storage.
Overall Architecture Design
When designing a database capable of handling tens of millions of QPS, you must consider overall architecture, data distribution, performance optimization, and operability as a unified whole.
Common Misconception
Any single monolithic database cannot reliably sustain such massive QPS over the long term; the goal is to keep the database’s actual QPS far lower than the front‑end request rate.
Request Flow
The traffic path typically follows:
Client ↓ CDN / Edge Cache ↓ Access Layer (Nginx / API Gateway) ↓ Application Layer (stateless horizontal scaling) ↓ Cache Layer (multi‑level) ↓ Database Layer (sharding + master‑slave + multi‑cluster)In mature systems, 90%–99.9% of requests never reach the database; the database only stores the final consistent core data.
Architecture and Scaling Strategies
Horizontal Sharding (Sharding)
Data is partitioned by business key or range across multiple nodes to avoid single‑point bottlenecks. The sharding strategy should support online migration and load balancing for elastic scaling.
Read‑Write Separation
Use master‑slave replication or multi‑master setups to route read traffic to read‑only replicas. For write‑heavy scenarios, employ ordered write queues or partitioned writes to reduce contention.
Storage Engine and Indexing
Select appropriate storage engines for hot workloads (in‑memory databases, high‑performance KV stores, columnar databases, etc.). Separate hot and cold data, moving cold data to low‑cost media. Keep indexes minimal—use covering indexes or pre‑computed fields to reduce random I/O, and consider asynchronous index updates or LSM‑Tree structures for write‑intensive workloads.
Multi‑Layer Caching System
Combine local caches (in‑process) with distributed caches such as Redis clusters, and offload static content to CDNs. Cache consistency policies should match business tolerance: strong consistency via write‑through/invalidation, eventual consistency via TTL or asynchronous updates.
Traffic Shaping (Peak‑Cutting)
By intercepting and handling 70%+ of requests at the CDN, edge cache, and application layers, the system can endure millions to tens of millions of QPS without overwhelming the database.
High Availability and Fault Tolerance
Deploy multiple replicas across different availability zones or data centers for rapid failover. Use automatic retries and circuit breakers to prevent fault propagation. Implement gray‑release, traffic steering, and service degradation strategies to prioritize core functionality under resource constraints. Perform regular snapshots, incremental backups, and disaster‑recovery drills to ensure data recoverability at scale.
Mike Chen's Internet Architecture
Over ten years of BAT architecture experience, shared generously!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
