Design and Scaling of Xianyu Messaging System (1.0 → 4.0)

The Xianyu instant‑messaging system evolved from a 2014 MVP to a robust, multi‑version architecture—adding distributed storage, hybrid sync, ACK‑based delivery, and feature‑driven monitoring—to handle billions of messages, improve stability, and boost user satisfaction, cutting technical complaints by half.

Xianyu Technology
Xianyu Technology
Xianyu Technology
Design and Scaling of Xianyu Messaging System (1.0 → 4.0)

The Xianyu (Idle Fish) instant‑messaging system has evolved through several generations to support billions of messages. The article records the technical changes from the initial MVP to the current stable architecture.

Version 1.0 – Minimal Viable Product : Launched in 2014 to enable basic chat for a new e‑commerce app. Required storage of conversations, summaries and messages, push/pull synchronization, and long‑connection or vendor push channels. Because the existing Taobao private‑message system could not meet the product‑centric conversation model (user + user + item), the team reused the private‑message data model and relied on the Taobao SDK/mtop for communication.

Version 2.0 – Re‑build for Rapid User Growth : User count exceeded 1 million, causing severe latency and server‑load issues due to full‑pull synchronization and a read‑heavy centralized store. The redesign introduced a domain‑ring (personal inbox) backed by Alibaba’s distributed KV store Tair, a hybrid full‑pull + incremental sync model, and separate online (ACCS long‑connection) and offline (AGOO push) channels.

Version 3.0 – Stability under Massive Scale : With DAU exploding, problems such as long‑connection loss, fake connections, and message loss emerged. Solutions included ACK‑based delivery confirmation, adaptive delayed retry, a client‑side message queue to isolate push/pull processing, and isolation of IM and marketing messages using MySQL for IM and Lindorm for marketing data.

Version 4.0 – NPS‑Driven Improvements : User surveys revealed low NPS due to missing features (search, grouping) and message delays. The team added robust monitoring (UT, SLS, Blink), command‑based remote diagnostics, and further storage isolation to protect IM stability. After these measures, technical‑related complaints dropped by 50 % and overall user experience improved.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

System ArchitectureScalabilityMessagingdistributed storageReal-time communication
Xianyu Technology
Written by

Xianyu Technology

Official account of the Xianyu technology team

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.