Backend Development 13 min read

How We Scaled a Live Chatroom to 15 Million Concurrent Users

This article details the evolution of a WeChat live‑room chat component from its 1.0 high‑performance design to a 2.0 architecture that overcomes scalability, reliability, and traffic‑isolation challenges, enabling a single room to support up to 15 million simultaneous online users.

WeChat Backend Team

Mar 6, 2021

How We Scaled a Live Chatroom to 15 Million Concurrent Users

Chatroom Overview

With the growth of live‑streaming scenarios in WeChat, a temporary message channel called the chatroom component was created to provide message exchange and online‑status statistics.

1500 Million‑User Challenge

After the launch of video‑account live streaming, the product required a single room to support 15 million concurrent users, prompting the question of whether a group of 1.3 billion people could be gathered.

Chatroom 1.0 Architecture

Born in 2017 for WeChat e‑sports live rooms, the 1.0 version focused on high‑performance, low‑latency, and highly scalable message delivery.

Message Framework Choice: Read Diffusion

Unlike WeChat groups that use write‑diffusion, chatrooms have no relationship chain and a high member churn, so a read‑diffusion mechanism is adopted.

Long‑Polling Mechanism

Long‑polling is used instead of WebSocket for three reasons: (1) push mode may lose messages, requiring a pull fallback; (2) maintaining an accurate online list in push mode is difficult; (3) long‑polling is a short‑lived connection that simplifies client implementation.

Stateless Cache Design

A stateless cache (recvsvr) is introduced to alleviate read‑disk pressure. It provides real‑time notifications, asynchronous pulling, fallback polling, lock‑free reads, and sect‑based deployment for scaling.

Pain Points of 1.0

Critical signals (e.g., co‑hosting, gift animations) may be lost.

Online list aggregation has a single‑point bottleneck.

No historical online‑user statistics.

Long‑polling cannot control request volume under continuous updates.

Chatroom 2.0 Architecture

To address the above issues, 2.0 focuses on reliable critical signaling, scalable online statistics, efficient historical online counting, and flexible traffic isolation.

Priority Message List

Important messages are marked with priority, stored separately in cache, and fetched before normal messages, ensuring zero loss for co‑hosting and large‑gift animations.

Distributed Online Statistics

Two approaches are explored: (1) shared‑memory master‑slave with sect deployment; (2) table‑kv storing user‑id and heartbeat time. The final solution combines key‑splitting, read‑write separation, and asynchronous aggregation to achieve lock‑free, high‑performance online queries.

HyperLogLog‑Based Historical Online Counting

HyperLogLog provides approximate cardinality with minimal space. For low counts, a dual write to table‑kv and HyperLogLog ensures accuracy; for high counts, HyperLogLog alone is used, achieving >95% accuracy above 10 k users.

Traffic Isolation (VIP Sect)

Large live rooms are routed to a VIP sect, while normal rooms use ordinary sects. This reduces KV‑layer pressure and isolates high‑traffic impact.

Adaptive Traffic Control

Based on online user count, request intervals are adjusted, client timestamps are recorded, and proxy‑level hold is applied to limit long‑polling frequency, cutting request volume by ~58% in high‑load tests.

Results

The redesigned system supports multiple business lines, passes stress tests for 15 million concurrent users, and meets performance, reliability, scalability, and disaster‑recovery requirements.

References

https://zhuanlan.zhihu.com/p/77289303 https://www.jianshu.com/p/4748af30d194

Conclusion & Outlook

Through problem abstraction, precise analysis, and thoughtful design, the chatroom 2.0 iteration achieves the standards needed for massive concurrent online users. Future work includes automatic VIP‑sect switching and further strengthening of important‑message channels.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed systems Scalability HyperLogLog longpolling Chatroom

Written by

WeChat Backend Team

Official account of the WeChat backend development team, sharing their experience in large-scale distributed system development.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Chatroom Overview

1500 Million‑User Challenge

Chatroom 1.0 Architecture

Message Framework Choice: Read Diffusion

Long‑Polling Mechanism

Stateless Cache Design

Pain Points of 1.0

Chatroom 2.0 Architecture

Priority Message List

Distributed Online Statistics

HyperLogLog‑Based Historical Online Counting

Traffic Isolation (VIP Sect)

Adaptive Traffic Control

Results

References

Conclusion & Outlook

WeChat Backend Team

How this landed with the community

Was this worth your time?

0 Comments

1500 Million‑User Challenge