How NetEase Cloud Scales to 10,000 Simultaneous Mic Connections: Backend Architecture Revealed

This article details NetEase Cloud's backend engineering solutions for supporting ten‑thousand‑user concurrent mic connections, covering distributed signaling architecture, QUIC‑based weak‑network handling, server‑side audio routing, video QoS strategies, and a global transmission network (WE‑CAN) to achieve high availability and scalability.

NetEase Smart Enterprise Tech+
NetEase Smart Enterprise Tech+
NetEase Smart Enterprise Tech+
How NetEase Cloud Scales to 10,000 Simultaneous Mic Connections: Backend Architecture Revealed

Introduction

This article is compiled from the online live sharing by Chen Ce, senior audio/video backend engineer at NetEase Cloud, titled “MCtalk Live #5: Revealing the technology behind ten‑thousand‑person concurrent mic connections”.

Problem Overview

High‑concurrency interactive scenarios such as large‑scale video conferences, low‑latency live streams, massive online classrooms, and Clubhouse‑like voice rooms require thousands of users to join the mic simultaneously. The common industry solution of "RTC + CDN" limits the number of mic participants to a few dozen and introduces noticeable audio‑video latency, which does not meet NetEase Cloud's requirements.

Signaling Technical Challenges

When a single user in a ten‑thousand‑person room joins or leaves the mic, the server must push signaling updates to the remaining 9,999 users, creating massive instantaneous load. A single centralized server cannot handle this, so a distributed tree architecture is employed. Each room forms an independent tree: the root node manages user states and subscription relationships, while child nodes act as edge servers for nearby users, acting only as message proxies.

To ensure high availability, the root node uses cache and database for persistent storage, and child nodes rely on client reconnection mechanisms for failover.

Weak‑Network Signaling Solution

Signaling traffic uses TCP, which degrades quickly under 30% packet loss, unlike media streams that use UDP. NetEase Cloud adopts QUIC as an accelerated signaling channel to match media's weak‑network resilience, with a fallback to WebSocket when QUIC is unavailable.

Audio Technical Challenges

In a room where every participant can speak, each client would need to subscribe to all other audio streams (N‑1). This is infeasible due to bandwidth and processing limits, and listening to more than three simultaneous speakers becomes unintelligible.

Traditional solutions—audio routing (selecting the loudest streams) and server‑side mixing—both have drawbacks at this scale. NetEase Cloud implements a distributed pre‑selection routing scheme: edge servers first select a small number of streams (default three) before cascading, reducing inter‑server traffic from O(N²) to O(M²) where M is the number of edge servers.

Video Technical Challenges

Client decoding capacity limits the number of simultaneous video streams. NetEase Cloud uses a QoS strategy that classifies users into four bandwidth tiers and sends multiple Simulcast/SVC layers (e.g., 720p/30fps, 720p/15fps, 720p/8fps, 180p/30fps). To improve fairness, the system applies bitrate compression based on the top N% of users' bandwidth, ensuring the best possible experience for the majority.

When uplink bandwidth is insufficient for Simulcast/SVC, a single‑stream 720p is sent to an MCU, which then performs Simulcast/SVC transcoding before forwarding to the SFU.

Server‑to‑Server Network Challenges

Cross‑operator and cross‑country transmission can be unstable. NetEase Cloud introduces WE‑CAN (Communications Acceleration Network), a global distributed real‑time transmission network that continuously measures link quality, computes optimal paths, and routes media packets accordingly, abstracting transport concerns from the business layer.

Conclusion

Through these backend innovations—distributed signaling trees, QUIC‑based weak‑network signaling, server‑side audio pre‑selection, tiered video QoS, and the WE‑CAN transmission network—NetEase Cloud achieves stateless, unlimited‑capacity mic rooms with horizontal elastic scaling and second‑level user‑network matching.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

distributed systemsHigh Concurrencyreal-time communicationQoSaudio video backendsignaling architectureWE-CAN
NetEase Smart Enterprise Tech+
Written by

NetEase Smart Enterprise Tech+

Get cutting-edge insights from NetEase's CTO, access the most valuable tech knowledge, and learn NetEase's latest best practices. NetEase Smart Enterprise Tech+ helps you grow from a thinker into a tech expert.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.