BIGO RTC: High‑Quality, Low‑Cost Real‑Time Communication through Core Operators and Scene Adaptation
The article explains how BIGO RTC achieves high‑quality, low‑cost real‑time audio‑video communication by optimizing core video operators such as HEVC encoding, AI‑driven super‑resolution and HDR, and by employing scene‑adaptive techniques like device performance tuning, content‑adaptive encoding and AI‑based pre‑processing to meet diverse latency constraints.
BIGO RTC provides high‑quality, low‑cost real‑time communication (RTC) services, serving over 150 countries and nearly 400 million monthly active users; this article focuses on its research and results in two areas: core operators and scene‑adaptive techniques.
RTC covers audio‑video calls and live streaming, both requiring real‑time processing; acceptable latency is under 400 ms (pleasant under 200 ms) for calls, while live streaming tolerates higher delays depending on interactivity.
The main challenges for RTC services are: (1) real‑time link processing that must meet frame‑rate requirements, (2) ultra‑low latency (hundreds of milliseconds) demanding minimal buffering, (3) low bitrate to keep bandwidth costs low, and (4) delivering high visual quality despite limited bandwidth, compute, and latency.
BIGO RTC addresses these challenges with three key advantages: high visual quality through advanced core operators, low operational cost via low bandwidth and compute consumption, and meeting business‑specific latency constraints (e.g., ≤300 ms for multi‑party calls, ≤2 s for low‑latency interactive live).
Core operators include a state‑of‑the‑art HEVC encoder that ranked second in the MSU international competition, a deep‑learning‑based super‑resolution (SR) algorithm accelerated by the proprietary BigoNN inference framework, and an HDR pipeline (de‑hazing, contrast enhancement) that improves visual quality on low‑end cameras, with blind‑test results showing 81 % preference.
Scene‑adaptive strategies consist of (a) device‑performance adaptation using white‑/black‑lists to select optimal parameters per phone model, (b) Content Adaptive Encoding (CAE) that predicts appropriate encoding settings from recent frames using control‑theoretic methods such as Kalman filtering, and (c) Content Adaptive Pre‑Processing (CAP) that applies AI‑driven classification to decide when to enable enhancement operators, thereby saving compute while preserving subjective quality.
Looking ahead, as network infrastructure and high‑end smartphones improve, BIGO expects 1080p 60 fps live streaming to become mainstream, with future support for HDR, 4K, and even VR‑integrated audio‑video experiences that blur the line between virtual and real worlds.
High Availability Architecture
Official account for High Availability Architecture.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.