Databases 13 min read

Scaling Zhihu’s Moneta Service with TiDB: Architecture, Performance, and Lessons Learned

This article describes how Zhihu’s Moneta application migrated billions of rows of user‑read data to the open‑source MySQL‑compatible NewSQL database TiDB, detailing the architectural redesign, performance improvements, migration challenges, and future expectations for TiDB 3.0.

Top Architect
Top Architect
Top Architect
Scaling Zhihu’s Moneta Service with TiDB: Architecture, Performance, and Lessons Learned

Zhihu, the Chinese Quora‑like platform, stores about 1.3 trillion rows of user‑read posts in its Moneta service, generating roughly 1 trillion new rows per month and facing strict latency (≤90 ms) and high‑throughput requirements.

To meet these challenges the team evaluated TiDB, an open‑source MySQL‑compatible NewSQL database with HTAP capabilities, and chose it for its strong consistency, horizontal scalability, and cloud‑native design.

System architecture requirements included high availability, handling >40 k writes per second, storing massive historical data, processing millions of queries per second, and tolerating false positives in content filtering.

The new architecture consists of three layers: a stateless, scalable client API and proxy at the top; a soft‑state layer with Redis caching in the middle; and a TiDB cluster (TiDB servers, TiKV storage, PD meta‑service, and TiSpark) at the bottom, all orchestrated by Kubernetes for self‑healing and global fault monitoring.

After migration, Moneta achieved significant performance gains: peak write throughput exceeded 40 k TPS, 99th‑percentile response time dropped to ~25 ms, and 999th‑percentile to ~50 ms, with average latency far lower even for long‑tail queries.

Key lessons learned include separating latency‑sensitive queries into dedicated TiDB instances, using SQL hints and low‑precision timestamps to improve execution plans, and leveraging TiDB’s distributed transaction layer to reduce network round‑trips.

Future work focuses on TiDB 3.0 features such as the Titan storage engine (reducing write amplification), table partitioning (improving query performance), gRPC batch messaging, multi‑threaded Raftstore, SQL plan management, and TiFlash for column‑ariented analytical workloads.

These enhancements are expected to further lower latency, simplify cluster management, and enable seamless horizontal scaling as data volumes exceed a trillion rows.

distributed systemsPerformance OptimizationTiDBHTAPNewSQLDatabase Scalability
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.