Databases 13 min read

Scaling Zhihu's Moneta Application with TiDB: Architecture, Performance, and Lessons Learned

This article details how Zhihu tackled the massive data and latency challenges of its Moneta service by migrating from MySQL sharding and MHA to the distributed NewSQL database TiDB, describing the new three‑tier architecture, performance gains, migration tactics, and expectations for TiDB 3.0.

Architecture Digest
Architecture Digest
Architecture Digest
Scaling Zhihu's Moneta Application with TiDB: Architecture, Performance, and Lessons Learned

Zhihu's Moneta application, which stores the read‑history of billions of posts, grew to about 1.3 trillion rows and generates roughly 1 trillion new rows each month, creating severe scalability and latency problems for its Post Feed service.

The main pain points were the need for high‑availability data, handling massive write throughput (over 40 k rows per second at peak), retaining long‑term historical data, processing high‑throughput queries (up to 12 million post checks per second), keeping query response times under 90 ms, and tolerating false‑positive filtering.

To meet these requirements Zhihu defined three architectural goals: high availability, excellent system performance, and easy scalability.

They evaluated their existing architecture—proxy, cache, and storage layers—and identified the drawbacks of MySQL sharding and Master‑High‑Availability (MHA) solutions, such as complex application code, difficult shard‑key changes, and lack of read load balancing.

TiDB, an open‑source MySQL‑compatible NewSQL database with HTAP capabilities, was chosen. Its core components include the stateless TiDB SQL layer, the distributed transactional key‑value store TiKV (using Raft for strong consistency), TiSpark for analytical workloads, and the PD meta‑service for cluster management.

In Moneta, TiDB was deployed in a three‑layer architecture: the top layer provides stateless, scalable API and proxy services; the middle layer consists of soft‑state components and a Redis cache that can recover from TiDB data; the bottom layer is a highly available TiDB cluster that self‑heals on node failures, all orchestrated by Kubernetes.

After migration, performance improved dramatically: write throughput exceeded 40 k rows/s, query throughput reached 30 k queries/s and 1.2 million post checks/s, with 99th‑percentile latency around 25 ms and 999th‑percentile latency around 50 ms, far below the 90 ms target.

Key lessons included using TiDB Data Migration (DM) and TiDB Lightning to import 1.1 trillion rows in just four days, separating latency‑sensitive queries into dedicated TiDB instances, applying SQL hints, low‑precision timestamps, and prepared statements to reduce round‑trips.

Resource planning revealed that Raft’s three‑replica requirement demands more hardware than a single‑master MySQL setup, so capacity must be provisioned in advance.

Looking ahead to TiDB 3.0, features such as the Titan storage engine (reducing write amplification), table partitioning (improving query performance), gRPC batch messaging, multithreaded Raftstore, SQL plan management, and the column‑store TiFlash engine are expected to further boost throughput, lower latency, and enable efficient analytical queries on massive datasets.

Overall, TiDB’s MySQL compatibility, horizontal scalability, and open‑source ecosystem allow Zhihu to continue expanding its data platform while contributing back to the community.

distributed systemsperformance optimizationTiDBHTAPNewSQLDatabase Scalabilitylarge-scale data
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.