Machine‑Learning Based Bandwidth Prediction and Adaptive Streaming for Taobao Live: Concerto, OnRL, and Loki

Alibaba’s Taobao Live team replaced rule‑based bandwidth estimators with three machine‑learning solutions—Concerto, OnRL, and Loki—trained on over a million hours of global live‑stream data, achieving up to 13% throughput gain, threefold stall reduction, and up to 44% lower 95th‑percentile stalls, now deployed commercially.

DaTaobao Tech

Apr 13, 2022

Machine‑Learning Based Bandwidth Prediction and Adaptive Streaming for Taobao Live: Concerto, OnRL, and Loki

This article shares the research and large‑scale practice (2018‑2021) of the audio‑video foundation team at Alibaba’s Taobao Live, focusing on machine‑learning driven bandwidth prediction algorithms.

Background: Traditional bandwidth‑estimation algorithms (GCC, BBR, PCC, QUBIC) rely on handcrafted rules and cannot cope with complex network dynamics or distinguish congestion loss from random loss. The team therefore explored a black‑box ML model trained on massive online network data to replace these rule‑based methods.

Data collection: Over 1 million hours of live streams from June 2018 were analyzed, covering 57 countries, 749 cities, 5 network types (WiFi, 4G, 3G, LTE, 2G), 512 operators and 934 device models. About 20% of sessions had packet‑loss > 1% and 10% had RTT > 300 ms, defining “sub‑healthy” transmission quality.

Concerto: A deep imitation‑learning model predicts bandwidth for the next second using both transport‑layer (loss, delay) and codec‑layer (encoding/receiving bitrate) features. A data‑driven simulator reproduces network traces, and a small‑scale testbed (three laptops) validates the design. In real‑world deployment, Concerto reduced throughput loss by ~13% and cut stall rate by threefold compared with GCC.

OnRL: To bridge the gap between offline training and online deployment, an online reinforcement‑learning framework (OnRL) was built. It combines PPO with federated‑learning‑style aggregation across thousands of concurrent video calls, handles encoder‑rate mismatches, and falls back to rule‑based control when RL behaves anomalously.

Loki: To address long‑tail QoE degradation, Loki fuses a rule‑based baseline (e.g., GCC) with a learned model. It converts the rule‑based algorithm into an equivalent neural network, then mixes high‑level features from both models using a dual‑weight mechanism. Loki achieves 13.98‑27.27% lower stall rates, 1.37‑5.71% higher video quality, and reduces the 95th‑percentile stall rate by up to 44%.

Results & impact: The three solutions (Concerto, OnRL, Loki) have been deployed in Taobao Live, improving user experience and generating commercial value. The work has produced three CCF‑A conference papers (MobiCom ’19, ’20, ’21) and earned the 2020 China Institute of Electronics Science and Technology Award (First Class).

Future work: Loki’s model, currently cloud‑based due to mobile compute limits, will be distilled and moved to edge devices to reduce cloud training costs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Machine Learning reinforcement learning adaptive bitrate bandwidth prediction media streaming real-time video

Written by

DaTaobao Tech

Official account of DaTaobao Technology

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.