Inside Ant's Real-Time Video Call System: Architecture & Optimizations

This article explores Ant Financial's real-time video call platform, detailing its technical choices, system architecture, signaling reliability design, network optimization strategies, and future directions for multi‑party video conferencing and interactive live streaming.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
Inside Ant's Real-Time Video Call System: Architecture & Optimizations

Introduction

From movies and TV to smartphones, video consumption has become increasingly convenient, and real‑time video calls and interactive live streams now occupy a large portion of young users' daily life. Ant Financial senior technical expert Zhang Song reveals the architecture and characteristics of the Ant Real‑Time Video Call System (ARTVCS), explaining its underlying technologies and applications.

Technical Selection

The core requirements lead to a technology stack based on UDP and RTP/RTCP for low‑latency transmission. WebRTC, an open‑source framework built on UDP, provides end‑to‑end audio/video capture, codec, processing, transport, and rendering, and its BSD‑style license makes it suitable for the interactive live‑streaming scenario.

System Architecture

ARTVCS combines a video‑conference system and a live‑streaming system. The video‑conference component can use P2P mesh or P2SP architectures, with mixing either at the client (P) or server (S) side, and supports both simple relay and merged‑stream forwarding. The live‑streaming side pushes streams via RTMP to a streaming service.

The overall system consists of three parts: video‑conference system, live‑streaming system, and an operations support platform. The video‑conference system includes mobile/web SDKs, room signaling service, NAT‑traversal assistance, media relay, and a control service that handles multi‑stream audio/video processing, codec, ICE, session management, and optional recording. The live‑streaming system comprises room management, central push nodes, and edge pull nodes. A quality‑monitoring subsystem provides health checks, media parameter control, real‑time quality assessment, and business data statistics.

Architecture Features

Signaling logic and transport channels are loosely coupled, allowing the use of WebSocket, RPC, or custom messaging channels. The modular design supports various real‑time video scenarios, including one‑to‑one calls, multi‑party conferences, and interactive live streams.

Key Technologies and Optimizations

Mobile SDK Size Reduction – By switching from static libraries to dynamic frameworks, the binary size impact is minimized, avoiding Apple’s 60 MB review limit and enabling on‑demand loading of video‑call features.

High‑Reliability Signaling – A three‑layer reliability design addresses timeout, loss, out‑of‑order, and duplicate messages. The upper layer implements timeout retransmission, duplicate filtering using unique message IDs, and out‑of‑order correction via a signaling relationship graph. The middle layer ensures reliable request/response between adjacent nodes (e.g., client ↔ room service). The lower layer handles channel‑level link failures with reconnection and heartbeat mechanisms.

Load Balancing and Network Acceleration – A software‑defined network of relay clusters selects optimal nodes based on network type and distance, forwarding media streams through the most efficient paths.

Improved Congestion Control – Customized UDP congestion control uses delay‑based bandwidth estimation and loss‑rate‑based estimation. The TrendlineFilter algorithm replaces KalmanFilter for faster, more accurate bandwidth estimation in weak‑network conditions.

Bitrate Adaptive Strategy – In weak networks, the system balances clarity and smoothness, dynamically adjusting resolution, frame rate, and bitrate. A quality‑first policy reserves bandwidth for audio and maintains a minimum resolution to keep video clear.

Future Outlook

Planned enhancements include H.265 encoding, OPUS FEC, super‑resolution processing, integration with AR/VR/AI/IoT, precise QoS models for network fluctuations, and more accurate bandwidth adaptation.

Scenario Applications

ARTVCS serves a wide range of use cases such as remote item inspection in the Xianyu app, online English tutoring, financial remote services, insurance claim inspections, IoT remote control, gaming voice chat, enterprise video meetings, education live streaming, medical remote diagnosis, e‑commerce interactive live streams, smart‑home monitoring, autonomous logistics, and online arcade machines.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

StreamingReal-time VideoWebRTCVideo ConferencingAnt FinancialSignal Reliability
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.