How Zhihu Scaled to Trillions of Rows with TiDB – Real‑Time Query Performance Insights
Zhihu’s Moneta service stores over a trillion rows and faces massive write and read loads; this article explains why TiDB was chosen, how its architecture and features such as HTAP, Raft, Titan and table partitioning enable millisecond‑level query latency, high availability, and seamless scaling.
Introduction
Zhihu, whose name means “Do you know?” in classical Chinese, is China’s Quora‑style Q&A platform. It has 220 million registered users, 30 million questions and more than 130 billion answers. The Moneta service stores roughly 1.3 trillion rows of data about posts users have already read, and the data volume grows by about 100 billion rows each month, reaching an estimated 3 trillion rows within two years. Maintaining millisecond‑level query response time at this scale is a severe challenge.
Pain Points and Requirements
High‑availability data : Post Feed is the first screen users see and must be reliable.
Huge write throughput : Peak write traffic exceeds 40 000 rows per second, adding nearly 30 billion rows daily.
Long‑term historical storage : Currently about 1.3 trillion rows are stored; this will grow to 3 trillion.
High‑throughput queries : During peak periods the system processes about 12 million post queries per second.
Strict latency bound : Query latency must stay under 90 ms even for long‑tail queries.
Tolerate false positives : The system should still return many interesting posts even if some are mistakenly filtered.
Desired Architecture
To meet the above needs the architecture must provide high availability, excellent performance, and easy scalability.
High availability : A stateless, scalable client API and proxy layer.
Outstanding system performance : Ability to handle high write and read throughput with strict latency.
Ease of scaling : The system should expand smoothly as business grows.
What is TiDB?
TiDB is an open‑source, MySQL‑compatible NewSQL database that supports hybrid‑transactional/analytical processing (HTAP). It consists of several core components:
TiDB server – a stateless SQL layer that processes user queries and talks to the storage layer.
TiKV – a distributed transactional key‑value store that uses the Raft consensus protocol for strong consistency and high availability.
TiFlash – a column‑store analytical engine for fast OLAP queries.
Placement Driver (PD) – an etcd‑backed metadata service that schedules and manages TiKV nodes.
TiDB also provides an ecosystem of tools such as Ansible deployment scripts, the Syncer for MySQL‑to‑TiDB migration, and TiDB Binlog for incremental backup.
Deploying TiDB in Moneta
The Moneta architecture after TiDB integration consists of three layers:
Top layer : Stateless, horizontally scalable client API and proxy components.
Middle layer : Soft‑state services and a layered Redis cache. In case of service interruption, these components can recover state from TiDB.
Bottom layer : The TiDB cluster stores all stateful data; its components are highly available and can self‑recover if a node crashes.
Kubernetes orchestrates the whole system to ensure global fault monitoring and high availability.
Performance Metrics
After moving to TiDB, the system achieved high availability, easy scalability, and significantly improved performance. Key metrics include:
Peak write throughput of 40 000 rows per second.
During peak periods, 30 000 queries per second across 1.2 million posts.
99th‑percentile response time ≈ 25 ms; 999th‑percentile ≈ 50 ms, with average latency far lower.
What We Learned
Data migration to TiDB took only four days for 1.1 trillion rows, far faster than the month‑long estimate for a naïve approach.
Separating latency‑sensitive queries into a dedicated TiDB instance prevents large queries from affecting them.
SQL hints and low‑precision timestamps (TSO) with prepared statements reduce network round‑trips.
Accurate hardware sizing is crucial; TiDB’s Raft‑based replication requires at least three replicas, increasing resource needs compared to a single‑master MySQL setup.
Expectations for TiDB 3.0
Titan engine : Reduces write amplification for large values, dramatically lowering both write and query latency.
Table partitioning : Splits tables by time range, allowing queries to hit only relevant partitions and improving performance.
gRPC batch messages & multi‑threaded Raftstore : Increases write throughput (Moneta exceeds 40 k TPS) and improves concurrency.
SQL plan management : Binds queries to optimal execution plans without embedding hints in SQL text.
TiFlash : Column‑store engine enables efficient analytical queries on massive datasets without separate ETL pipelines.
Next Steps
Because TiDB is MySQL‑compatible, existing tools and workflows can be reused. Its horizontal scalability allows Zhihu to continue expanding the database beyond a trillion records while maintaining low latency and high availability. Ongoing collaboration with the open‑source community and PingCAP will further strengthen TiDB’s capabilities.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Interview Crash Guide
Dedicated to sharing Java interview Q&A; follow and reply "java" to receive a free premium Java interview guide.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
