Design and Migration of Zhihu's Read Service: From Bloom Filter to TiDB
This article details Zhihu's read‑service architecture, its massive data scale and performance challenges, early Bloom‑filter and HBase solutions, the design goals of high availability, high performance and scalability, and the subsequent migration from MySQL to TiDB with cloud‑native practices.
Zhihu operates a large‑scale knowledge platform with over 30 million questions and more than 1.3 billion answers, requiring an efficient read‑filter service to avoid recommending already‑seen content to users.
The business model is simple—query whether a user has read a specific piece of content—but the service must handle trillions of records, a write rate of up to 30 billion new entries per day, and peak write loads of 40 K rows per second while maintaining sub‑100 ms response times.
Key challenges include high availability, massive write volume, long‑term storage of 12 trillion records, high query throughput (30 K QPS), and strict latency limits.
Early implementations used a Bloom‑filter stored in a Redis cluster, which suffered from high CPU cost, memory pressure, and difficulty sizing the filter. A subsequent HBase‑based solution mapped user‑id to row‑key and document‑id to qualifier, but cache‑miss latency and GC pauses made it unsuitable for the required response times.
To meet the three design goals—high availability, high performance, and easy scalability—Zhihu built a new cache‑through buffering system called RBase. The architecture separates stateless client APIs and proxies (which perform slot‑based load balancing and fault isolation) from a MySQL cluster managed by MHA, with weak‑state buffering layers that can be rebuilt from replicas.
RBase’s cache layer uses Bloom‑filters to reduce memory usage and a write‑through, read‑through strategy to improve hit rates and avoid unnecessary invalidations.
The system employs multi‑level buffering inspired by CPU cache hierarchies, allowing different cache tiers to apply distinct policies and reducing cross‑data‑center traffic. Cache tags isolate offline push workloads from online homepage queries.
Initially, the service relied on MySQL with sharding and TokuDB compression, handling about 4 × 10⁴ writes per second and 3 × 10⁴ queries per second with P99 latency around 25 ms.
To overcome MySQL’s operational limits, Zhihu migrated the service to TiDB, a MySQL‑compatible distributed database. Data migration used TiDB Lightning for the initial bulk load (≈45 TB) and DM for incremental binlog sync. After several failed attempts due to resource constraints, a conservative plan using eight dedicated machines completed the full migration of 1.11 × 10¹³ rows in four days.
Post‑migration, traffic was gradually shifted to TiDB. Initial latency spikes were mitigated by splitting cache‑miss handling into a fast blocking query and an asynchronous full‑data rebuild, and by prioritizing SQL statements with hints (Low/Normal/High) to isolate workloads. TiDB Binlog was adapted to use multiple Kafka partitions for better throughput.
After two months of gray‑scale rollout, the TiDB‑based service matched MySQL’s performance metrics while providing cloud‑native high‑availability and elastic scaling.
Finally, the article notes that TiDB 3.0’s new features (Titan engine, gRPC batch messages, multi‑threaded Raft) further improve resource efficiency and are planned to be adopted for the read service.
Overall, the case study demonstrates how careful architectural evolution, cache‑through design, and a disciplined migration to a distributed SQL database can sustain massive read‑filter workloads with stringent latency requirements.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.