Backend Development 12 min read

Design and Performance Analysis of a Cascaded SFU Architecture for Video Conferencing (VCS)

The article presents a technical overview of a WebRTC‑based video conferencing system that employs a single‑SFU architecture, identifies scalability and latency challenges in large‑scale and global deployments, and proposes a cascaded SFU solution with detailed signaling, buffer management, and performance evaluation demonstrating improved load balancing and extensibility.

360 Smart Cloud
360 Smart Cloud
360 Smart Cloud
Design and Performance Analysis of a Cascaded SFU Architecture for Video Conferencing (VCS)

Video Conferencing System (VCS) is a full‑chain meeting solution launched by 360 ZhiHui Cloud, built on WebRTC technology to provide a scalable, low‑latency, high‑performance real‑time communication platform for enterprises.

The system adopts a Selective Forwarding Unit (SFU) architecture, which forwards audio/video streams from a client to multiple receivers without transcoding, thereby significantly reducing processing delay and resource consumption while maintaining media quality.

In large‑scale scenarios such as "dual‑teacher classrooms", a single SFU faces bandwidth limits (e.g., 1080p 30 FPS requires ~3.5 MB per user, leading to >700 MB outbound bandwidth for 200 students) and geographic latency issues when the central server is far from users.

To address these challenges, a cascaded SFU approach is introduced, where multiple SFU nodes cooperate in a distributed network, each handling a regional subset of clients, achieving load balancing and higher fault tolerance.

The current architecture connects clients to the SFU via WebSocket for signaling and establishes two PeerConnection (PC) links for upstream and downstream media. After SDP negotiation, client media is captured, encoded, encapsulated into SRTP, and sent upstream.

On the server side, SRTP packets are decrypted to RTP, stored in a buffer, and their sequence numbers are indexed in an ordered queue. An RTP dispatcher extracts packets from the buffer according to the queue, encrypts them back to SRTP, and distributes them to downstream PCs based on subscription.

In the cascaded design, the original buffer is retained, but a deque is added to read RTP packets by sequence number and forward them to another SFU. The receiving SFU uses the track's unique SSRC as an identifier to map packets to the correct buffer, ensuring ordered storage and fast retrieval.

Signaling is enhanced to model rooms and participants (local and remote). When a user joins, the system checks for the room on the current SFU, creates it if absent, synchronizes room information across SFU nodes via a message queue, and propagates participant and track metadata to all relevant nodes.

Performance tests compare cascaded and non‑cascaded SFU setups. Non‑cascaded latency is 64 ms, cascaded latency is 68 ms, a negligible difference. CPU usage shows that a single machine can handle roughly 9,100 streams in a non‑cascaded setup, while cascaded SFU supports about 8,800 streams per machine; two cascaded machines can therefore handle ~17,500 streams, offering better scalability with minimal performance loss.

The cascaded architecture brings several advantages: load balancing across servers, easy scalability for growing traffic, separation of public and private networks for security, optimized routing to reduce cross‑region bandwidth costs, and a solution to the public‑domain and large‑scale challenges of RTC services.

backend developmentReal‑Time Communicationwebrtcvideo conferencingSFUCascaded Architecture
360 Smart Cloud
Written by

360 Smart Cloud

Official service account of 360 Smart Cloud, dedicated to building a high-quality, secure, highly available, convenient, and stable one‑stop cloud service platform.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.