Backend Development 12 min read

Design and Optimization of a High‑Throughput Live‑Streaming Danmaku System

This article describes the challenges of delivering real‑time danmaku for a Southeast Asian live‑streaming service, analyzes bandwidth pressure, network instability, and reliability issues, and presents a series of backend optimizations—including HTTP compression, response simplification, short‑polling, service splitting, caching, and a custom ring‑buffer—to reliably support up to one million concurrent users.

Architecture Digest
Architecture Digest
Architecture Digest
Design and Optimization of a High‑Throughput Live‑Streaming Danmaku System

Background

To better support Southeast Asian live‑streaming, the product added a danmaku (bullet‑screen) feature. The first version, built on Tencent Cloud, suffered from frequent stutter and insufficient comment density, prompting the development of a custom danmaku system capable of handling up to one million simultaneous users per room.

Problem Analysis

The system faces three main problems:

Bandwidth pressure Assuming a delivery interval of 3 seconds, at least 15 comments per interval are needed for smooth visual experience. 15 comments plus HTTP headers exceed 3 KB, resulting in roughly 8 Gbps of traffic, while the available bandwidth is only 10 Gbps.

Weak network causing stutter and loss This issue has already been observed in production.

Performance and reliability With a projected QPS over 300 k, the service must remain stable during peak events such as Double‑Eleven sales.

Bandwidth Optimization

We adopted the following measures to reduce bandwidth consumption:

Enable HTTP compression Gzip can achieve a compression ratio of over 40 % (higher than deflate by 4‑5 %).

Simplify response structure

Reorder content for better compression Placing strings and numbers together increases redundancy, improving gzip efficiency.

Frequency control

Bandwidth control: add a request‑interval parameter so the server can throttle client request rates during traffic spikes.

Sparse control: during periods of low comment density, delay the next request to avoid unnecessary traffic.

Danmaku Stutter and Loss Analysis

The core question is whether to use push or pull delivery.

Long Polling via AJAX

The client opens an AJAX request that the server holds until an event occurs, then returns a response. Enabling HTTP Keep‑Alive can also reduce handshake latency.

Advantages: fewer polls, low latency, good browser compatibility. Disadvantages: the server must maintain many open connections.

WebSockets

WebSocket provides true bidirectional communication with minimal header overhead (2‑10 bytes for server‑to‑client frames, plus 4 bytes mask for client‑to‑server). It offers lower latency, full‑duplex data flow, and optional binary frames and compression.

Advantages: reduced control overhead, stronger real‑time performance, persistent connection.

Long Polling vs WebSockets

Both rely on TCP long connections, whose health is detected via TCP Keep‑Alive (keepalive_probes, keepalive_time, keepalive_intvl). In weak Southeast Asian networks, connections often drop, and detection intervals differ:

Long Polling detection interval: min(keepalive_intvl, polling_interval) WebSocket detection interval: min(keepalive_intvl, client_sending_interval)

Because connections may already be broken when the next packet is sent, TCP keep‑alive offers limited benefit, and WebSockets become unsuitable under weak networks.

Even if the server detects a broken WebSocket, it cannot push data until the client reconnects, causing potential data loss.

Each reconnection requires a new application‑layer handshake.

Given the limitations of both long polling and WebSockets, we finally adopted a short‑polling strategy for danmaku delivery.

Reliability and Performance

To improve stability, we split the service into two parts: a high‑frequency pull service and a lower‑frequency push service. This prevents the pull side from overwhelming the push side and allows independent scaling.

Pull side : introduced a local cache refreshed via periodic RPC calls. Cached comments are served directly from memory, drastically reducing latency and external dependency impact.

Data is sharded by timestamp into a ring‑buffer that retains only the most recent 60 seconds. Reads traverse the buffer from the tail pointer backward, yielding high read efficiency with an array‑based storage.

Writes are single‑threaded, so no concurrency control is needed. Reads are limited to the last 30 seconds, enabling lock‑free operation.

Push side : applies rate‑limiting to discard excess comments, and uses graceful degradation (e.g., avatar fetch or profanity filter failures) to keep the core pipeline functional.

Conclusion

During the Double‑12 promotion, even with a brief Redis outage, the system sustained 700 k concurrent users with high efficiency and stability, meeting the target objectives.

performancebackend architectureLive Streamingbandwidth optimizationdanmakushort polling
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.