Case Study: Scaling Zhihu’s Moneta Application with TiDB
This article details how Zhihu’s Moneta service, handling over a trillion rows of user‑read data, migrated from MySQL sharding and MHA to the open‑source NewSQL database TiDB, achieving millisecond‑level query latency, high write throughput, and improved scalability through a layered architecture and TiDB’s advanced features.
Zhihu’s Moneta application stores roughly 1.3 trillion rows of data representing posts users have already read, generating about 1 trillion new rows per month and facing strict latency and throughput requirements.
The team identified several pain points: the need for high‑availability data, massive write ingestion (over 40 k records per second at peak), long‑term historical storage, high‑throughput queries (processing up to 12 million posts per second), sub‑90 ms query response time, and tolerance for false positives.
To meet these demands, they evaluated three key architectural components—proxy, cache, and storage—and found MySQL sharding with MHA insufficient due to complexity, limited scalability, and operational risks.
They adopted TiDB, an open‑source MySQL‑compatible NewSQL HTAP database, whose core components include stateless TiDB servers, distributed TiKV storage with Raft consensus, PD meta‑service, and TiSpark for analytical workloads, plus an ecosystem of tools such as Ansible deployment scripts, Syncer, and TiDB Binlog.
TiDB’s main features—horizontal scalability, strong consistency, cloud‑native design, and HTAP capabilities—addressed the Moneta system’s requirements. The new architecture consists of a top‑layer stateless API/proxy, a middle‑layer soft‑state components with Redis caching, and a bottom‑layer TiDB cluster storing all stateful data, all orchestrated by Kubernetes for high availability.
Performance metrics after migration showed significant improvements: peak write throughput exceeded 40 k TPS, 99th‑percentile query latency dropped to ~25 ms, and 999th‑percentile to ~50 ms, with average latency far lower.
Key lessons learned include the importance of separating latency‑sensitive queries onto dedicated TiDB instances, using SQL hints and low‑precision timestamps to optimize execution plans, and provisioning sufficient hardware for TiDB’s Raft‑based replication (minimum three replicas).
Future work focuses on TiDB 3.0 features such as the Titan storage engine (reducing write amplification), table partitioning (improving query performance), gRPC batch messaging, multi‑threaded Raftstore, SQL plan management, and TiFlash columnar analytics, all expected to further enhance Moneta and anti‑spam services.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.