Cloud Computing 16 min read

How to Build Scalable, High‑Availability Real‑Time Audio‑Video Systems

This talk explains the evolution and practical implementation of large‑scale real‑time audio‑video communication, covering common architectures such as direct P2P, MCU, and SFU, network topologies, scalability, high‑availability techniques, edge computing, and emerging technologies like WebRTC, SDN, and AI‑driven enhancements.

Qingyun Technology Community
Qingyun Technology Community
Qingyun Technology Community
How to Build Scalable, High‑Availability Real‑Time Audio‑Video Systems

On July 29‑30, 2021, the CIC 2021 Cloud Computing Summit in Beijing featured a presentation titled "Practice and Evolution of Large‑Scale Real‑Time Audio‑Video Technology Architecture" by Shen Weifeng, sharing common real‑time audio‑video communication architectures, network topologies, and the complexity of real‑world scenarios.

Common Real‑Time Audio‑Video Architectures

Direct (P2P)

In a peer‑to‑peer setup each client registers with a server for discovery, then connects directly without a media server. NAT traversal may require STUN or a relay server.

MCU (Multipoint Conferencing Unit)

Clients send audio‑video streams to a central server that decodes, synchronizes, mixes, and re‑encodes them before forwarding to all participants. This star topology imposes high CPU load on the server.

SFU (Selective Forwarding Unit)

Clients send streams to a server that forwards them unchanged to subscribed peers. Mixing is done on the client side, reducing server load. SFU can forward different video resolutions (Simulcast/SVC) based on each client’s bandwidth.

Comparison and Summary

Direct P2P is unsuitable for large conferences and lacks content moderation. With decreasing compute and bandwidth costs, SFU becomes advantageous for massive concurrency, while MCU remains common in traditional enterprise scenarios.

Network Topology Construction

Ring Topology

Nodes form a closed loop; simple routing but a single node failure breaks the network.

Star/Tree Topology

Star has a central hub; easy to manage but the hub is a bottleneck. Tree extends the star, allowing hierarchical scaling with multiple edge nodes.

Mesh Topology

Every device connects to every other, offering high reliability and low latency but with complex routing and traffic control.

Diversity of Real‑World Scenarios

Network Access Diversity – Mobile (3G/4G/5G), wired broadband (LAN, ADSL, PON/FTTH), and Wi‑Fi each present different bandwidth and stability characteristics.

Device Diversity – Desktops, mobiles, wearables, IoT devices vary in network modules, cameras, microphones, CPUs, and GPUs.

Server Access Diversity – Multi‑line BGP, multi‑carrier dedicated lines, or single‑carrier lines affect routing and redundancy.

Dynamic changes include bandwidth fluctuations, packet loss, jitter, and varying capture quality (noise, distortion).

Network dynamics: bandwidth, loss, jitter, latency.

Capture dynamics: noise, distortion, jitter.

Architecture Evolution and Practice

High Concurrency and High Availability

To achieve high concurrency, services are clustered and load‑balanced across multiple servers.

Automatic fault recovery and graceful degradation ensure the system remains usable when components fail; for example, disabling video while keeping audio.

Elastic scaling of compute and network resources is enabled by virtualization and SDN technologies.

Geographic disaster recovery deploys multiple clusters in different locations and routes traffic to healthy clusters when failures occur.

These techniques allow the service to achieve 99.95% availability worldwide.

High‑Quality Service

Quality is maintained through bandwidth estimation, congestion control, packet loss recovery, forward error correction, multi‑layer distribution (SVC/Simulcast), noise reduction, echo cancellation, adaptive volume, and resource reservation, enabling high‑quality experience even with 70% packet loss.

Massive Scale and Ultra‑High Concurrency

In SFU, selective forwarding reduces bandwidth: instead of forwarding every stream to all participants, the server forwards only the streams needed for each user’s layout (e.g., 1 large + 6 small videos).

Edge computing nodes arranged in a tree topology extend conference size and reduce latency for the last mile.

For one‑way live streaming, CDN can be used, but interactive latency is higher (3‑10 s), so edge or central DC is preferred for two‑way communication.

Paile Cloud Audio‑Video System Architecture

The left side handles registration, authentication, configuration, discovery, and scheduling; the right side provides big‑data analytics, health monitoring, alerts, and elastic scaling. Core services include voice calls, video calls, interactive whiteboard, live interaction, and cloud recording.

Industry Trends and Emerging Technologies

WebRTC

Since Google open‑sourced GIPS as WebRTC in 2010 and it became a W3C standard in 2014, WebRTC has dramatically lowered the barrier to real‑time communication, spawning many services.

SDN

Software‑Defined Networking separates control and data planes, enabling programmable, virtualized networks that simplify path optimization and automation.

Machine‑Learning‑Based Algorithms

Network: intelligent congestion control, bandwidth estimation, routing.

Video: virtual backgrounds, super‑resolution, video fusion, deepfake.

Audio: speech recognition, enhancement.

VR, AR, and 3D

Combining virtual reality, augmented reality, and 3D technologies promises immersive conference experiences where participants feel as if they share a physical meeting room.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Edge Computingreal-time communicationWebRTCScalable SystemsSFUaudio-video architecture
Qingyun Technology Community
Written by

Qingyun Technology Community

Official account of the Qingyun Technology Community, focusing on tech innovation, supporting developers, and sharing knowledge. Born to Learn and Share!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.