Backend Development 10 min read

Design and Optimization of a High‑Performance Live‑Streaming Danmaku System

This article details the challenges and solutions for building a scalable, low‑latency danmaku service for live streaming, covering background requirements, bandwidth constraints, protocol choices, short‑polling implementation, reliability measures, and performance results that supported 700,000 concurrent users during a major event.

Top Architect
Top Architect
Top Architect
Design and Optimization of a High‑Performance Live‑Streaming Danmaku System

Background

To better support Southeast Asian live‑streaming, a danmaku feature was added. The first version using Tencent Cloud showed frequent stutter and missing comments, prompting the development of a custom danmaku system capable of handling up to one million concurrent users per room.

Problem Analysis

Bandwidth pressure: delivering at least 15 comments every 3 seconds with HTTP headers exceeds 3 KB per batch, resulting in roughly 8 Gbps traffic, close to the 10 Gbps limit.

Weak networks cause stutter and loss, especially in the Southeast Asian region.

Performance and reliability: expected QPS can exceed 300 k, requiring robust handling during peak events like Double‑Eleven.

Bandwidth Optimization

Enable HTTP gzip compression, achieving >40 % reduction.

Simplify response structure (see image).

Reorder content to increase redundancy, improving compression ratios.

Frequency control: add request‑interval parameters to limit client request rates and implement sparse control during low‑traffic periods.

Danmaku Stutter and Loss Analysis

The main dilemma was choosing between push and pull delivery. Long polling via AJAX reduces request overhead but requires many persistent connections. WebSockets offer bidirectional communication with lower header overhead but still need many connections and struggle in weak networks.

Both Long Polling and WebSockets were unsuitable, so a short‑polling approach was adopted to achieve timely delivery while keeping connection overhead low.

Reliability and Performance

Service splitting isolates the heavy send‑danmaku logic from the high‑frequency pull‑danmaku service, allowing independent scaling and preventing one service from overwhelming the other.

Pull‑side uses a local cache refreshed via RPC, reducing latency and external dependency impact.

Sent‑side employs rate limiting and graceful degradation (e.g., avatar fetch or profanity filter failures do not block core flow).

A time‑based ring buffer stores only the latest 60 seconds of comments, enabling lock‑free reads for up to 30 seconds of data.

Summary

During the Double‑Twelve event, despite a brief Redis outage, the system reliably supported 700 k concurrent users, meeting the performance goals.

backendperformanceLive Streamingbandwidth optimizationdanmakushort polling
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.