Backend Development 12 min read

Design and Optimization of a High‑Performance Danmu System for Southeast Asian Live Streaming

This article presents a detailed case study of designing a high‑throughput danmu (bullet‑screen) system for Southeast Asian live streaming, analyzing bandwidth constraints, latency issues, and evaluating long‑polling, WebSocket, and short‑polling approaches, ultimately describing the chosen short‑polling solution, service partitioning, caching, and ring‑buffer techniques to achieve stable performance for millions of concurrent users.

Architecture Digest

Jan 6, 2023

Design and Optimization of a High‑Performance Danmu System for Southeast Asian Live Streaming

Background

To better support Southeast Asian live streaming, the product added a danmu (bullet‑screen) feature. The first phase used Tencent Cloud, but performance was poor with frequent stutters and insufficient danmu, prompting the development of a custom system capable of supporting up to one million concurrent users per room.

Problem Analysis

Based on the background, the system faces three main challenges:

Bandwidth pressure

If a user receives an update every 3 seconds, at least 15 danmu messages are needed to avoid visual stutter. 15 messages plus HTTP headers exceed 3 KB, resulting in roughly 8 Gbps per second, while the available bandwidth is only 10 Gbps.

Stutter and loss caused by weak networks

This issue has already appeared in the production environment.

Performance and reliability

With a million concurrent users, QPS can exceed 300 k. Ensuring stability during peak events such as Double‑Eleven is critical.

Bandwidth Optimization

To reduce bandwidth pressure, we adopted the following measures:

Enable HTTP compression

Gzip can achieve a compression ratio of over 40 % (about 4‑5 % higher than deflate).

Simplify response structure

Optimize content ordering

Because gzip compresses better with higher redundancy, placing strings and numbers together improves the ratio.

Frequency control

Bandwidth control: add a request‑interval parameter so the server can limit client request frequency, providing a lossy service during traffic spikes.

Sparse control: during periods with few danmu, adjust the next request time to avoid unnecessary client requests.

Danmu Stutter and Loss Analysis

When developing the danmu system, the key question is the delivery mechanism: push vs pull?

Long Polling via AJAX

The client opens an AJAX request to the server and waits for a response. The server must support request suspension and return data as soon as an event occurs. Enabling HTTP Keep‑Alive can also save handshake time.

Advantages: reduces polling frequency, low latency, good browser compatibility. Disadvantages: the server must maintain many connections.

WebSockets

Long polling still requires many connections, so a truly bidirectional, low‑overhead solution was sought. WebSocket allows the server to push data and the client to send data, supporting binary frames, compression, and other extensions.

Advantages: • Minimal protocol overhead (2‑10 bytes for server‑to‑client frames, plus 4 bytes mask for client‑to‑server). • Strong real‑time capability due to full‑duplex communication. • Persistent connection.

Long Polling vs WebSockets

Both rely on TCP long connections. How does TCP detect a broken connection?

TCP Keep‑Alive probes the connection state, controlled by three parameters:

keepalive_probes: number of probes (default 7) keepalive_time: timeout (default 2 hours) keepalive_intvl: interval between probes (default 75 s)

In weak Southeast Asian networks, TCP long connections often drop. The shortest detection interval for Long Polling is min(keepalive_intvl, polling_interval), while for WebSockets it is min(keepalive_intvl, client_sending_interval). Because a connection may already be broken when the next packet is sent, TCP long connections are of limited value, and WebSockets also become unsuitable under weak networks.

Even if the server detects a broken WebSocket, it cannot push data until the client reconnects, causing potential data loss.

Each reconnection requires a new application‑level handshake.

According to Tencent Cloud’s danmu system, push is used for < 300 users, and polling for larger audiences, likely implemented with WebSocket. However, both Long Polling and WebSockets are unsuitable for our scenario, so we finally adopted a short‑polling approach for danmu delivery.

Reliability and Performance

To ensure service stability, we split the system: complex logic is confined to the danmu‑sending side, while the high‑frequency, simple pull service is separated. This prevents the pull service from overwhelming the send service and vice versa, facilitating scale‑up and scale‑out and clarifying business boundaries.

On the pull side, we introduced a local cache. The service periodically RPC‑calls the danmu service to refresh the cache, allowing subsequent requests to read directly from memory, reducing latency and external dependency impact.

Data is sharded by time and stored in a RingBuffer that retains only the last 60 seconds. The tail pointer moves forward each second, and read requests compute an index from the client’s timestamp, traversing backward up to 30 seconds of data. Because writes are single‑threaded and reads are limited to recent data, the buffer can operate lock‑free.

On the send side, we limit the total danmu volume per user, discarding excess messages. Optional branches such as avatar fetching or sensitive‑word filtering are designed to fail gracefully, ensuring core danmu delivery continues even when auxiliary services fail.

Conclusion

During the Double‑12 promotion, despite a brief Redis outage, the service efficiently supported 700 k concurrent users, meeting the target with high stability and performance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

real-time messaging WebSocket bandwidth optimization short polling long polling

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.