How Alibaba Cut Live Stream Latency Below 300ms with a New Architecture

Facing pandemic-driven remote teaching, Alibaba’s live streaming team redesigned their media pipeline, combining CDN, custom real-time protocols, WebRTC, and cloud-native techniques to control transmission and playback buffers, achieving sub-300 ms host-to-host latency and under-600 ms host-to-viewer latency while maintaining smooth playback.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How Alibaba Cut Live Stream Latency Below 300ms with a New Architecture

1. Industry User Experience Analysis

In interactive live streaming there are three elements: main‑mic host, auxiliary‑mic host, and viewers. The host‑to‑host audio/video call usually stays under 300 ms, but host‑to‑viewer interactions (text, images, gifts) can exceed 3000 ms, causing delays such as missed greetings, lagged game actions, and voting issues.

After entrance effects finish, no greeting is received.

When hosts and viewers play games together, actions are delayed.

During host‑to‑host PK, last‑second vote requests fail.

The goal is to make interactions among the three elements consistent, achieving host‑to‑host latency <300 ms and host‑to‑viewer latency <600 ms.

2. Traditional Solutions

Traditional live streaming keeps host‑to‑host latency under 300 ms, but the path to viewers goes through RTMP transmission, CDN distribution, and player buffering, resulting in roughly 3000 ms delay.

The main sources of latency are RTMP transmission, CDN, and player cache. CDN is hard to control, making latency improvements difficult.

Video CDN greatly reduces developer workload.

Performance depends on how much of the transmission chain you can control.

CDN is uncontrollable, so waiting for CDN changes is unrealistic.

Because CDN is uncontrollable, building a low‑latency player that talks directly to the server is also infeasible.

Traditional solutions are a comfortable yet risky bed.

3. Low‑Latency Live System

The solution is a fully controllable media transmission system.

3.1 Project Challenges

The design had to merge CDN’s massive concurrency with real‑time communication’s low latency. Balancing buffer size to reduce latency while keeping smooth playback proved contradictory.

3.2 Solution

The low‑latency system integrates CDN, a private real‑time protocol, WebRTC, and cloud‑native technologies, converting the problem into controllable transmission and playback layers.

3.2.1 Latency Sources and Controllability

Only the transmission layer and player buffer are controllable. By managing the entire transmission process, latency was reduced to under 118 ms, and the low‑latency player dynamically adjusts its buffer to keep delay within 415 ms while preserving smoothness. Two strategies—smoothness‑first and latency‑first—support both host‑to‑host and host‑to‑viewer interactions.

3.2.2 CDN Evolved into RDN System

RDN merges CDN architecture with a media‑server node. Streams are lazily loaded to edge nodes; viewers are directed to the nearest edge via GSLB, enabling fast delivery.

3.2.3 Transmission Delay Minimization

Efforts focus on reducing transmission latency to the lowest possible level.

3.2.4 Low‑Latency Player

The player uses a customized neteq to resist network jitter and packet loss, applying filtered audio‑video sync to maintain user experience even during rapid speed changes.

3.2.5 Full‑Link Monitoring (Bian Que)

The Bian Que system collects logs from client and server sides, monitoring service quality across the entire chain and aiding online issue diagnosis.

4. Data Report

With smoothness and start‑up rates comparable to traditional solutions, interactive latency dropped by 86 %. The system has run stably since 2017, supporting various interactive live scenarios such as PK competitions and real‑time audience engagement.

5. Reflections

The low‑latency live system fuses traditional streaming, real‑time communication, and WebRTC, achieving synchronized text and audio‑video interaction. It eliminates delays for greetings, voting, and game actions. The author reflects on his evolving technical mindset, noting the key technologies:

Traditional streaming = RTMP, CDN, ijkplayer

Real‑time communication = MCU, SIP, RTP/RTCP

WebRTC = JSEP, P2P, SFU, ICE, SDP, neteq

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

live streamingCDNLow latencyWebRTCbackend-development
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.