Backend Development 9 min read

How Momo Live’s IM Architecture Scaled from V1 to V2

This article examines the challenges of high‑traffic live‑streaming instant messaging, presents the original V1 TCP‑based architecture, explains its performance bottlenecks, and details the redesigned V2 solution that adopts Netty, protobuf, compression, and RPC to dramatically reduce bandwidth and latency while supporting millions of concurrent users.

MOMOLive Tech Team

Sep 24, 2019

How Momo Live’s IM Architecture Scaled from V1 to V2

Overview

In the era of ubiquitous internet, live streaming and real‑time interactive features such as bullet comments and gift effects have become essential, making instant messaging (IM) a core component of most applications. This article reviews the architecture of Momo Live’s IM system and its evolution over recent years.

Live IM Characteristics

Consider a room with 10,000 concurrent viewers, where each second 1,000 users send chat messages and another 1,000 send gifts, each averaging 50 bytes. This results in roughly 20,000,000 messages per second, consuming about 7.4 Gbps of downstream bandwidth.

Massive concurrent online users lead to huge broadcast volume.

High outbound bandwidth consumption.

Strict real‑time requirements for interactive scenarios.

The article focuses on addressing these pain points without delving into deep technical details.

V1 Architecture

The early V1 design follows a TCP model with a push‑pull hybrid messaging pattern. It is built on Java, using Mina for networking and Redis for storage. The service is divided into three layers:

Connection layer (connector) – maintains long‑lived connections for each room and handles message I/O.

Logic layer (router) – implements business logic such as multi‑device login kicking and message lifecycle management.

Message storage – two Redis instances used for pub/sub and caching.

Key components include:

User upstream handling (connection, authentication, heartbeat).

Custom JSON‑based protocol parsing and routing to the logic service.

In‑memory message cache backed by Redis pub/sub.

Room session management keyed by room ID.

Message forwarding that notifies clients of new messages.

User downstream handling.

Kafka for asynchronous tasks (e.g., logging, analytics).

Message broadcasting assembly and publishing.

Router‑side message queue for internal communication.

Redis pub/sub cluster for distributing new messages to all connector nodes.

External services (gift, chat, etc.) that produce messages.

While V1 was quickly deployed and initially stable, it encountered several bottlenecks as user volume grew:

High bandwidth usage due to verbose JSON.

Memory‑heavy message cache causing GC pressure.

Complex Redis maintenance for the router.

Unfriendly Redis‑queue interaction between services.

Significant latency from the push‑pull model.

V2 Architecture

The V2 version retains TCP but abandons the push‑pull hybrid in favor of a pure push model. Technically, Mina is replaced by the more active Netty framework, and RPC replaces Redis queues for inter‑service communication. The component layout remains similar, with Kafka added for asynchronous processing.

Bandwidth reduction: switching from JSON to protobuf and applying message compression saved roughly 30‑40% of downstream bandwidth.

Latency improvement: eliminating the pull phase and pushing messages directly to TCP connections reduced message delay.

Memory optimization: messages are no longer cached per‑room in memory; they are delivered directly to user queues.

Stability boost: internal RPC replaced Redis queues, simplifying service interaction.

Additional enhancements: encryption performance, monitoring, rate limiting, service degradation handling, and Java GC tuning.

The V2 architecture has proven stable under long‑duration, high‑volume traffic and is better suited for live‑streaming scenarios with massive message throughput.

Conclusion

This introductory article presented the two versions of Momo Live’s IM architecture, highlighted the shortcomings of the original design, and explained the key optimizations implemented in V2. Future posts will dive deeper into specific technical details.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Backend Architecture live streaming Scalability Protobuf Instant Messaging

Written by

MOMOLive Tech Team

Momo Video Department Technical Articles Column

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.