How Zhihu Scaled to Trillions of Rows with TiDB: Lessons from Moneta
Zhihu’s Moneta service, handling over 1.3 trillion rows and billions of daily writes, migrated from MySQL sharding to TiDB, achieving millisecond query latency, high availability, and horizontal scalability, while sharing architectural choices, performance metrics, migration challenges, and future expectations for TiDB 3.0.
Background
Zhihu, China’s largest knowledge‑sharing platform, serves 220 million registered users, 30 million questions and over 130 billion answers. Its Moneta service stores the posts each user has already read, amounting to roughly 1.3 trillion rows and growing by about 100 billion rows per month. Within two years the dataset is expected to exceed 3 trillion rows.
Key Pain Points
High availability: Users must not see already‑read posts on the recommendation feed.
Massive write volume: Peak traffic exceeds 40 k writes per second, adding nearly 30 billion new rows daily.
Long‑term storage: Historic data must be retained for years.
High‑throughput queries: Up to 12 million post‑level queries per second during peaks.
Strict latency: Query response time must stay under 90 ms, even for long‑tail queries.
Tolerate false positives: The system may occasionally surface an already‑read post.
Architecture Requirements
High availability to avoid a poor user experience.
Outstanding system performance to meet high throughput and latency goals.
Easy scalability as business and data volume grow.
Exploration of Solutions
We evaluated three core components for the ideal architecture:
Proxy: Routes user requests to healthy nodes, ensuring HA.
Cache: Handles frequent reads in memory, reducing database load.
Storage: Initially a standalone MySQL cluster, later replaced by TiDB.
Drawbacks of MySQL Sharding & MHA
MySQL Sharding
Application code becomes complex and hard to maintain.
Changing the shard key is cumbersome.
Upgrading logic impacts availability.
MHA
Requires custom scripts or third‑party tools for virtual IP configuration.
Only monitors the primary node.
Needs password‑less SSH, introducing security risks.
No read‑load‑balancing for replicas.
Only checks primary server health.
Why TiDB?
TiDB is an open‑source, MySQL‑compatible NewSQL database with HTAP capabilities. Its main components are:
TiDB Server: Stateless SQL layer that processes queries and talks to TiKV.
TiKV: Distributed transactional key‑value store using Raft for strong consistency and HA.
TiSpark: Apache Spark plugin for OLAP queries on TiKV data.
PD (Placement Driver): Metadata service (etcd‑backed) that schedules TiKV regions.
Additional tools include Ansible deployment scripts, Syncer for MySQL‑to‑TiDB migration, and TiDB Binlog for incremental backup.
Key TiDB features:
Horizontal scalability
MySQL‑compatible syntax
Strongly consistent distributed transactions
Cloud‑native architecture
HTAP for ETL‑free analytics
Fault tolerance with Raft recovery
Online schema changes
TiDB in Moneta’s Architecture
The new stack is divided into three layers:
Top layer: Stateless, horizontally scalable client API and proxy.
Middle layer: Soft‑state services and a layered Redis cache; can self‑recover using data persisted in TiDB.
Bottom layer: TiDB cluster storing all stateful data; highly available and self‑healing.
Kubernetes orchestrates the entire system, providing global fault monitoring and ensuring high availability.
Performance Improvements
After deploying TiDB, the system showed dramatic gains:
Write throughput > 40 k rows/s.
During peaks, 30 k queries and 1.2 M post checks per second.
99th‑percentile latency ≈ 25 ms; 999th‑percentile ≈ 50 ms.
Relevant charts:
Lessons Learned
Fast data import: Using TiDB DM and Lightning, 1.1 trillion rows were imported in four days (versus a month with naive writes).
Reducing query latency: Separate TiDB instances for latency‑sensitive queries; added SQL hints; employed low‑precision TSO and prepared statements to cut round‑trips.
Resource evaluation: TiDB’s Raft replication needs at least three replicas, requiring more hardware than a single‑master MySQL setup, but the extra resources enable easier scaling.
Expectations for TiDB 3.0
TiDB 3.0 introduces several enhancements that we have already tested in our anti‑spam service:
Titan storage engine: Reduces write amplification for large values, cutting both write and query latency (see comparison chart).
Table partitioning: Allows time‑range partitions, dramatically improving query performance.
Batch Raft messages & multi‑threaded Raftstore: Increases write throughput and reduces node count (from seven to five nodes for similar load).
SQL plan management: Binds queries to optimal execution plans without modifying SQL text.
TiFlash: Columnar analytics engine enabling fast OLAP on massive data without separate ETL pipelines.
Next Steps
TiDB’s MySQL compatibility lets us adopt it without rewriting existing code. Its horizontal scalability prepares us for future growth beyond a trillion records. We plan to continue contributing to the open‑source ecosystem and work with PingCAP to further strengthen TiDB.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Backend Technology
Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
