How Zhihu Scaled to Trillions of Records with TiDB: A Real‑World Case Study
Zhihu’s Moneta service, handling over a trillion rows of user‑read data, migrated from MySQL sharding to the open‑source HTAP database TiDB, achieving millisecond query latency, high availability, and horizontal scalability while detailing architecture changes, performance metrics, and lessons learned.
Introduction
Zhihu, the largest knowledge‑sharing platform in China, serves 220 million registered users and stores about 1.3 trillion rows of data in its Moneta application, which records posts users have already read.
With a monthly growth of roughly 100 billion rows, the data volume is expected to reach 3 trillion rows within two years, posing serious challenges for backend scalability while maintaining a good user experience.
This article explores how to keep query response times in the millisecond range on such massive data, and how TiDB—a MySQL‑compatible NewSQL HTAP database—provides real‑time insights.
Pain Points and Requirements
The Moneta service must satisfy:
High‑availability data: Post Feed is the first screen users see, crucial for traffic.
Huge write throughput: Over 40 k records per second during peak hours, adding nearly 3 billion records daily.
Long‑term historical storage: Currently ~1.3 trillion records, projected to reach 3 trillion in two years.
High‑throughput queries: Around 12 million post queries per second at peak.
Response time ≤ 90 ms: Even the longest‑tail queries must meet this.
Tolerate false positives: The system can surface many interesting posts even if some are incorrectly filtered.
Consequently, the desired architecture must provide high availability, excellent performance, and easy scalability.
Exploration of Architecture
To achieve the goals, three key components were integrated into the previous architecture:
Proxy: Forwards user requests to available nodes, ensuring high availability.
Cache: Handles requests in memory, reducing load on the database and improving performance.
Storage: Previously MySQL with sharding and MHA, which became insufficient as data grew.
Drawbacks of MySQL Sharding and MHA
MySQL sharding introduced complex, hard‑to‑maintain code, difficult schema changes, and upgrade‑induced downtime.
MHA required custom scripts for virtual IPs, only monitored the master, needed password‑less SSH (a security risk), lacked read load balancing, and could not monitor standby servers.
What Is TiDB?
TiDB is an open‑source HTAP NewSQL database compatible with MySQL. Its core components are:
TiDB server: Stateless SQL layer that processes queries and accesses the storage layer.
TiKV server: Distributed transactional key‑value store using Raft for strong consistency and high availability.
TiSpark: Apache Spark plugin for complex OLAP queries.
PD (Placement Driver): Metadata cluster backed by etcd that manages and schedules TiKV.
TiDB also provides an ecosystem of tools such as Ansible deployment scripts, Syncer for MySQL migration, TiDB Lightning for fast data import, and TiDB Binlog for incremental backup.
Key features include horizontal scalability, MySQL‑compatible syntax, distributed transactions with strong consistency, cloud‑native architecture, HTAP‑enabled ETL, fault tolerance via Raft, and online schema changes.
TiDB in Moneta’s Architecture
The new architecture consists of three layers:
Top layer: Stateless, scalable client API and proxy.
Middle layer: Soft‑state components and layered Redis cache; can recover from failures using data persisted in TiDB.
Bottom layer: TiDB cluster storing all stateful data, self‑healing upon node failures.
Kubernetes orchestrates the entire system, providing global fault monitoring and ensuring high availability.
Performance Metrics
After deploying TiDB in production, Moneta achieved significant improvements:
Peak write throughput: 40 k rows/second.
Peak query load: 30 k queries/second across 12 million posts.
99th‑percentile response time ≈ 25 ms; 99.9th‑percentile ≈ 50 ms, with average latency far lower.
Lessons Learned
Fast Data Import: Using TiDB DM to capture MySQL binlogs and TiDB Lightning, 1.1 trillion records were imported in four days, far faster than a month with traditional methods.
Reducing Query Latency: Separate TiDB instances for latency‑sensitive queries, SQL hints for optimal plans, and low‑precision TSO timestamps with prepared statements reduced round‑trip time.
Resource Evaluation: TiDB’s Raft‑based replication requires at least three replicas, demanding more hardware than a single‑master MySQL setup, but the trade‑off yields better scalability and fault tolerance.
Expectations for TiDB 3.0
Experiments with TiDB 3.0 (RC versions) introduced:
Titan engine: Reduces write amplification for large values, dramatically lowering write and query latency.
Table partitioning: Improves query performance by scanning only relevant partitions.
Additional features such as batch Raft messages via gRPC, multithreaded Raftstore, SQL plan management, and TiFlash (columnar storage for analytical workloads).
These enhancements further boost Moneta’s write throughput (over 40 k TPS) and enable fewer physical nodes while maintaining performance.
Next Steps
TiDB’s MySQL compatibility allows seamless migration, and its horizontal scalability lets Zhihu handle over a trillion records without bottlenecks.
Zhihu plans to continue contributing to open‑source tools and collaborate with the PingCAP community to strengthen TiDB.
Source: http://itindex.net/
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
