How to Design a High‑Throughput Database Architecture for a Billion‑Row Daily Log System
This guide breaks down a real‑world interview scenario where a backend engineer must design a scalable database solution for a billing‑log or feed system handling 100 million daily inserts and 100 k QPS reads, covering partitioning vs sharding, sharding key selection, shard count, read‑write separation, multi‑level caching, consistency patterns, hot‑key mitigation, and online schema changes.
Backend engineers often face a gap between writing SQL and designing high‑concurrency, massive‑data systems; interviewers now test architecture skills rather than isolated knowledge.
Scenario Overview
The interview presents a core system responsible for a national‑level app’s billing‑log or dynamic feed, with the following characteristics:
Write load: 100 million new rows per day in a single table.
Read load: Peak QPS can reach 100 000, with reads dominating.
Data shape: Time‑series data, continuously growing, rarely updated or deleted; queries are typically by user ID and time range.
The candidate is asked to propose a complete database architecture and justify each decision.
1. Partitioning vs Sharding
Given the volume, a single MySQL instance cannot survive more than a day. The recommended answer is horizontal sharding rather than partitioning, because partitioning keeps all data and indexes on the same server, hitting I/O and storage limits, whereas sharding distributes data across multiple machines.
2. Sharding Strategy
Sharding key: user_id is natural since most queries filter by user ID; it keeps a user’s data on the same shard and avoids cross‑shard joins, though one must consider data skew for very large users.
Shard count: Estimate based on MySQL performance (optimal < 50‑100 million rows per table). With 365 × 100 million ≈ 36.5 billion rows per year, roughly 730 shards are needed; designing for 1024 or 2048 shards provides headroom.
Shard implementation: Choose between client‑side sharding (e.g., Sharding‑JDBC) for lower latency but tighter coupling, or middleware sharding (e.g., MyCAT) for transparency at the cost of added complexity.
3. Handling 100 k QPS Reads
Two classic techniques are required:
Read‑write separation: Deploy a master‑slave cluster per shard; writes go to the master, reads are served by multiple slaves, allowing horizontal scaling of read capacity.
Multi‑level caching: Implement an L1 local cache (e.g., Caffeine) in the application server, followed by an L2 distributed cache (e.g., Redis). Most reads should be satisfied by the cache; only cache misses fall through to the database.
4. Cache Consistency & Hot‑Key Problems
Use the Cache‑Aside pattern: on read, check cache then DB; on write, update DB then delete the cache entry. Deleting avoids stale data issues.
Address three common cache pitfalls:
Cache penetration: Filter nonexistent keys with a Bloom filter.
Cache breakdown (hot‑key miss): Guard cache rebuild with a distributed lock so only one request hits the DB.
Cache avalanche: Stagger TTLs with random offsets to prevent massive simultaneous expirations.
5. Long‑Term Data Management & Online DDL
Cold‑data separation: Archive data older than a threshold (e.g., one year) to cheaper storage such as HBase, ClickHouse, or cloud object storage (OSS/S3), reducing load on the primary MySQL cluster.
Online schema changes: Never run ALTER TABLE directly on large sharded tables. Use tools like gh‑ost or pt‑online‑schema‑change to create a shadow table, migrate data incrementally, and switch without locking the master.
Conclusion
Successfully answering all “levels” demonstrates a shift from being a code implementer to an architect with a holistic view, balancing performance, scalability, cost, and operational risk.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
