How Zhihu Scaled to Trillions of Rows with TiDB: Lessons and Best Practices
Zhihu's Moneta service stores over a trillion rows of user‑read posts and, facing massive write throughput and strict sub‑90 ms query latency requirements, migrated to the open‑source HTAP database TiDB, achieving horizontal scalability, high availability, and dramatically improved performance.
Zhihu, China’s largest knowledge‑sharing platform with 220 million users and 30 million questions, stores about 1.3 trillion rows of data in its Moneta service that tracks posts users have read. Monthly data growth reaches roughly 100 billion rows and is projected to hit 3 trillion rows within two years, creating severe challenges for maintaining millisecond‑level query response times.
Our Pain Points
High availability data: The post feed must reliably filter already‑read posts for a smooth user experience.
Massive write volume: Peak periods see over 40 000 writes per second, adding nearly 30 billion new records daily.
Long‑term storage: Currently about 1.3 trillion rows are stored, expected to reach 3 trillion in two years.
High‑throughput queries: During peaks the system processes roughly 30 000 queries per second across 1.2 million posts.
Response time limit: Queries must complete within 90 ms, even for long‑tail cases.
Tolerate false positives: The system can afford occasional over‑filtering of interesting posts.
System Architecture Requirements
High availability to avoid poor user experience when reading the recommendation page.
Outstanding performance to meet high throughput and strict latency.
Easy scalability to support future growth.
Exploration of Prior Architecture
Previously Moneta relied on three key components: a proxy to forward requests, an in‑memory cache to reduce database hits, and a storage layer built on standalone MySQL. MySQL sharding and Master High Availability (MHA) were used, but they introduced several drawbacks:
Application code became complex and hard to maintain.
Changing shard keys was cumbersome.
Upgrading logic impacted availability.
MHA required custom VIP scripts, monitored only the master, lacked read‑load balancing, and introduced security risks.
What Is TiDB?
TiDB is an open‑source NewSQL database that is MySQL‑compatible and provides HTAP (Hybrid Transaction/Analytical Processing) capabilities. Its core components are:
TiDB server – a stateless SQL layer that processes queries and interacts with storage.
TiKV – a distributed transactional key‑value store using Raft for strong consistency and high availability.
TiSpark – an Apache Spark plugin for complex OLAP queries.
PD (Placement Driver) – an etcd‑backed metadata service that schedules TiKV.
Additional tools include Ansible deployment scripts, Syncer for MySQL migration, TiDB Binlog for incremental backup, and TiFlash for columnar analytics.
Key TiDB features: horizontal scalability, MySQL‑compatible syntax, distributed strong‑consistent transactions, cloud‑native architecture, HTAP‑enabled ETL, fault tolerance with Raft recovery, and online schema changes.
TiDB in Our Architecture
The revised Moneta architecture consists of three layers:
Top layer: Stateless, scalable client APIs and proxies.
Middle layer: Soft‑state components and a layered Redis cache; these can recover service state from TiDB if a failure occurs.
Bottom layer: A TiDB cluster storing all stateful data, providing self‑healing through Raft replication.
Kubernetes orchestrates the entire system, ensuring global fault monitoring and high availability.
TiDB Performance Metrics
After deploying TiDB in production, the system showed significant improvements:
Peak write throughput: 40 000 rows per second.
Peak query load: 30 000 queries per second across 1.2 million posts.
99th percentile latency ≈ 25 ms; 999th percentile latency ≈ 50 ms.
What We Learned
Faster data import: Using TiDB Data Migration (DM) and TiDB Lightning, 1.1 trillion rows were imported in four days, far faster than the month‑long estimate with MySQL.
Reduced query latency: SQL hints, low‑precision TSO timestamps, and prepared statements helped meet latency targets.
Resource evaluation: TiDB’s Raft protocol requires at least three replicas, meaning more hardware is needed compared to a single‑master MySQL setup.
Expectations for TiDB 3.0
Titan engine: Reduces write amplification for large values, cutting both write and query latency.
Table partitioning: Improves query performance by limiting scans to relevant time‑range partitions.
gRPC batch messages & multi‑threaded Raftstore: Boosts concurrency and reduces network round‑trips.
SQL plan management: Allows binding queries to specific execution plans without modifying SQL text.
TiFlash: Columnar storage engine enabling efficient analytical queries on massive datasets.
Next Steps
TiDB’s MySQL compatibility lets us adopt it with minimal changes. Its horizontal scalability ensures we can continue to grow beyond a trillion records while maintaining low latency. We plan to contribute to the open‑source ecosystem and work closely with PingCAP to further strengthen TiDB.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Backend Technology
Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
