How We Achieved Low‑Latency, High‑Definition Multi‑Angle Live Streaming with WebRTC

This article details the design and implementation of a low‑latency, high‑definition multi‑angle live streaming solution using WebRTC, covering protocol selection, system architecture, edge commands, client integration, performance optimizations, and lessons learned from deploying the feature in a large‑scale live event.

Alibaba Terminal Technology
Alibaba Terminal Technology
Alibaba Terminal Technology
How We Achieved Low‑Latency, High‑Definition Multi‑Angle Live Streaming with WebRTC

Background

Live streaming differs from video‑on‑demand due to its real‑time, highly interactive nature, allowing users to like, comment, and reward while watching. Content‑type interaction has evolved to include multi‑angle live streams, which greatly enhance user experience in large events such as street dance competitions.

Solution Selection

The key criteria were low‑latency switching, high‑definition quality, and full device coverage. Common application‑layer protocols considered were HLS (including LHLS), RTMP, and RTP. RTMP was selected for push streaming despite firewall concerns, while RTP was chosen for pull streaming because it meets the latency and flexibility requirements.

Three implementation approaches were compared:

Single‑stream pseudo‑multi‑angle (HLS) – switch requires reloading the stream URL.

Multi‑stream multi‑angle (HLS) – multiple streams are pulled simultaneously, allowing seamless switching.

Single‑stream multi‑angle (RTP) – one stream is pulled, but switching does not require a new URL.

After a detailed comparison, the single‑stream multi‑angle solution based on RTP (edge solution) was chosen.

WebRTC Overview

WebRTC (Web Real‑Time Communication) is an API that enables browsers to perform real‑time audio and video communication. It uses UDP (RTP/RTCP) for media transport, with ICE, STUN, TURN for NAT traversal, DTLS for encryption, and SCTP for data channels.

WebRTC internal architecture
WebRTC internal architecture

System Design

The overall pipeline includes stream production, domain scheduling, edge service, mixing service, and playback control. The edge service caches the mixed streams, encodes them, and delivers them to clients via RTP.

Overall system architecture
Overall system architecture

Detailed Design

The client adds a multi‑angle player that reuses the main playback interface. It creates sub‑windows (GLKView wrapped by RTCEAGLVideoView) for each angle, managed through a scrollable list (UITableView/RecyclerView). The player can operate in mix, cover, or source modes.

Core Process

When a user switches angles, the client sends an RTP switch command; the edge node changes the stream ID and returns SEI information. After receiving SEI, the client updates the rendering windows, ensuring synchronized playback.

Edge Commands

Connect command: blocking, establishes RTP connection and waits for a response.

Disconnect command: non‑blocking, tears down the RTP connection.

Play command: non‑blocking, includes stream ID and OSS configuration.

Switch command: non‑blocking, carries the original frame timestamp for synchronization.

Project Deployment

Playback capability adjustments were made: audio sample rate was increased, AAC decoding support added, H.264 encoding enabled (replacing VP8/VP9), and transport switched from P2P to RTP to suit large‑scale live events.

Integration with Existing Playback

The multi‑angle player wraps WebRTC, shares the same data request flow, playback APIs, and error‑code scheme with the main player, while exposing additional interfaces for multi‑angle features.

Client Issues and Solutions

1. Playback flicker : after a switch command, a short frame‑drop window is introduced until SEI confirms success, preventing visual residue.

2. Memory leak : the leak originated from OpenGL rendering instances not being released when cells were recreated. Proper destruction of C++ objects via bridge‑retained cleanup eliminated the leak.

Service Concurrency Optimization

Pre‑encoding multiple angles at the edge reduces CPU consumption. Streams are aligned by absolute timestamps; when a GOP boundary is not available during a switch, the edge service generates a new GOP to ensure seamless decoding.

Dynamic Decoding and Caching

On‑demand decoding of YUV frames is employed to avoid unnecessary CPU load. A dynamic client‑side buffer adjusts its water‑level: low during angle switches for fast response, higher otherwise to smooth network jitter.

Conclusion and Outlook

The multi‑angle capability was launched in the final of “This! Is Street Dance”, receiving strong positive feedback. Future work will focus on interaction improvements and further latency reduction.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Edge Computinglive streamingVideo Encodinglow-latencyWebRTCmulti-angle
Alibaba Terminal Technology
Written by

Alibaba Terminal Technology

Official public account of Alibaba Terminal

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.