Backend Development 11 min read

Design and Optimization of a High‑Performance Live‑Streaming Danmaku System

This article describes the design, challenges, and optimization strategies of a custom live‑streaming danmaku system for Southeast Asian markets, covering bandwidth constraints, latency issues, long‑polling versus WebSocket approaches, service splitting, caching, and a ring‑buffer implementation that supported 700 k concurrent users during a major sales event.

Top Architect
Top Architect
Top Architect
Design and Optimization of a High‑Performance Live‑Streaming Danmaku System

The live‑streaming product added a danmaku feature to better support Southeast Asian users, but the initial Tencent Cloud solution suffered from stutter and low message volume, prompting the development of a custom system capable of handling up to one million concurrent users per room.

Problem analysis identified three main challenges: bandwidth pressure (estimated 8 Gbps required for 15 messages per 3 seconds, exceeding the available 10 Gbps), network instability causing message loss, and the need to sustain over 300 k QPS during peak events while ensuring reliability.

Bandwidth optimization was achieved by enabling HTTP gzip compression (reducing payload by >40 %), simplifying response structures, reordering content to improve compressibility, and implementing frequency controls such as request interval throttling and sparse‑message suppression.

Danmaku stutter and loss analysis compared push versus pull delivery, long‑polling and WebSocket mechanisms, and TCP keep‑alive behavior. In weak‑network environments both long‑polling and WebSocket suffered from frequent disconnections, leading to the adoption of a short‑polling approach.

Long polling vs. WebSocket highlighted that while both rely on TCP long connections, WebSocket incurs higher connection‑maintenance overhead and performs poorly under unstable networks, whereas short‑polling offers a more tolerant fallback.

Reliability and performance were ensured by splitting the service into a high‑frequency pull service and a lower‑frequency push service, isolating load and allowing independent scaling. The pull side introduced local caching with periodic RPC updates, drastically reducing latency.

The pull service also employed a time‑based ring‑buffer cache that stores only the latest 60 seconds of messages. By keeping only a tail pointer and using an array for storage, reads are lock‑free and can retrieve up to 30 seconds of data efficiently.

On the push side, rate‑limiting and graceful degradation (e.g., optional avatar fetching or profanity filtering) ensured core functionality remained stable even when auxiliary services failed.

Summary – The final architecture, validated during a Double‑12 promotion, sustained 700 k concurrent users despite a brief Redis outage, demonstrating that the combination of bandwidth reduction, short‑polling delivery, service splitting, and ring‑buffer caching can meet high‑scale, low‑latency requirements for live‑streaming danmaku.

backendperformanceLive StreamingcachingWebSocketdanmakulong polling
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.