Databases 5 min read

Designing a Million‑QPS Database Architecture: Sharding, Caching, and High Availability

This article explains how to architect a database system that can sustain tens of millions of queries per second by combining sharding, read‑write separation, multi‑layer caching, traffic shaping, and robust high‑availability strategies to keep most requests off the database and ensure reliable data storage.

Mike Chen's Internet Architecture

Dec 16, 2025

Designing a Million‑QPS Database Architecture: Sharding, Caching, and High Availability

Overall Architecture Design

When designing a database capable of handling tens of millions of QPS, you must consider overall architecture, data distribution, performance optimization, and operability as a unified whole.

Common Misconception

Any single monolithic database cannot reliably sustain such massive QPS over the long term; the goal is to keep the database’s actual QPS far lower than the front‑end request rate.

Request Flow

The traffic path typically follows:

Client ↓ CDN / Edge Cache ↓ Access Layer (Nginx / API Gateway) ↓ Application Layer (stateless horizontal scaling) ↓ Cache Layer (multi‑level) ↓ Database Layer (sharding + master‑slave + multi‑cluster)

In mature systems, 90%–99.9% of requests never reach the database; the database only stores the final consistent core data.

Architecture and Scaling Strategies

Horizontal Sharding (Sharding)

Data is partitioned by business key or range across multiple nodes to avoid single‑point bottlenecks. The sharding strategy should support online migration and load balancing for elastic scaling.

Read‑Write Separation

Use master‑slave replication or multi‑master setups to route read traffic to read‑only replicas. For write‑heavy scenarios, employ ordered write queues or partitioned writes to reduce contention.

Storage Engine and Indexing

Select appropriate storage engines for hot workloads (in‑memory databases, high‑performance KV stores, columnar databases, etc.). Separate hot and cold data, moving cold data to low‑cost media. Keep indexes minimal—use covering indexes or pre‑computed fields to reduce random I/O, and consider asynchronous index updates or LSM‑Tree structures for write‑intensive workloads.

Multi‑Layer Caching System

Combine local caches (in‑process) with distributed caches such as Redis clusters, and offload static content to CDNs. Cache consistency policies should match business tolerance: strong consistency via write‑through/invalidation, eventual consistency via TTL or asynchronous updates.

Traffic Shaping (Peak‑Cutting)

By intercepting and handling 70%+ of requests at the CDN, edge cache, and application layers, the system can endure millions to tens of millions of QPS without overwhelming the database.

High Availability and Fault Tolerance

Deploy multiple replicas across different availability zones or data centers for rapid failover. Use automatic retries and circuit breakers to prevent fault propagation. Implement gray‑release, traffic steering, and service degradation strategies to prioritize core functionality under resource constraints. Perform regular snapshots, incremental backups, and disaster‑recovery drills to ensure data recoverability at scale.