Inside Ant's Real-Time Video Call System: Architecture & Optimizations
This article explores Ant Financial's real-time video call platform, detailing its technical choices, system architecture, signaling reliability design, network optimization strategies, and future directions for multi‑party video conferencing and interactive live streaming.
Introduction
From movies and TV to smartphones, video consumption has become increasingly convenient, and real‑time video calls and interactive live streams now occupy a large portion of young users' daily life. Ant Financial senior technical expert Zhang Song reveals the architecture and characteristics of the Ant Real‑Time Video Call System (ARTVCS), explaining its underlying technologies and applications.
Technical Selection
The core requirements lead to a technology stack based on UDP and RTP/RTCP for low‑latency transmission. WebRTC, an open‑source framework built on UDP, provides end‑to‑end audio/video capture, codec, processing, transport, and rendering, and its BSD‑style license makes it suitable for the interactive live‑streaming scenario.
System Architecture
ARTVCS combines a video‑conference system and a live‑streaming system. The video‑conference component can use P2P mesh or P2SP architectures, with mixing either at the client (P) or server (S) side, and supports both simple relay and merged‑stream forwarding. The live‑streaming side pushes streams via RTMP to a streaming service.
The overall system consists of three parts: video‑conference system, live‑streaming system, and an operations support platform. The video‑conference system includes mobile/web SDKs, room signaling service, NAT‑traversal assistance, media relay, and a control service that handles multi‑stream audio/video processing, codec, ICE, session management, and optional recording. The live‑streaming system comprises room management, central push nodes, and edge pull nodes. A quality‑monitoring subsystem provides health checks, media parameter control, real‑time quality assessment, and business data statistics.
Architecture Features
Signaling logic and transport channels are loosely coupled, allowing the use of WebSocket, RPC, or custom messaging channels. The modular design supports various real‑time video scenarios, including one‑to‑one calls, multi‑party conferences, and interactive live streams.
Key Technologies and Optimizations
Mobile SDK Size Reduction – By switching from static libraries to dynamic frameworks, the binary size impact is minimized, avoiding Apple’s 60 MB review limit and enabling on‑demand loading of video‑call features.
High‑Reliability Signaling – A three‑layer reliability design addresses timeout, loss, out‑of‑order, and duplicate messages. The upper layer implements timeout retransmission, duplicate filtering using unique message IDs, and out‑of‑order correction via a signaling relationship graph. The middle layer ensures reliable request/response between adjacent nodes (e.g., client ↔ room service). The lower layer handles channel‑level link failures with reconnection and heartbeat mechanisms.
Load Balancing and Network Acceleration – A software‑defined network of relay clusters selects optimal nodes based on network type and distance, forwarding media streams through the most efficient paths.
Improved Congestion Control – Customized UDP congestion control uses delay‑based bandwidth estimation and loss‑rate‑based estimation. The TrendlineFilter algorithm replaces KalmanFilter for faster, more accurate bandwidth estimation in weak‑network conditions.
Bitrate Adaptive Strategy – In weak networks, the system balances clarity and smoothness, dynamically adjusting resolution, frame rate, and bitrate. A quality‑first policy reserves bandwidth for audio and maintains a minimum resolution to keep video clear.
Future Outlook
Planned enhancements include H.265 encoding, OPUS FEC, super‑resolution processing, integration with AR/VR/AI/IoT, precise QoS models for network fluctuations, and more accurate bandwidth adaptation.
Scenario Applications
ARTVCS serves a wide range of use cases such as remote item inspection in the Xianyu app, online English tutoring, financial remote services, insurance claim inspections, IoT remote control, gaming voice chat, enterprise video meetings, education live streaming, medical remote diagnosis, e‑commerce interactive live streams, smart‑home monitoring, autonomous logistics, and online arcade machines.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
