How NetEase CloudSign Built a Real-Time Audio/Video Engine with WebRTC

This article explains how NetEase CloudSign leveraged WebRTC to engineer a real-time audio‑video engine from scratch, detailing the engineering workflow, SDP signaling, and four key optimizations such as Simulcast, hardware codec support, audio profiling, and transmission strategy adjustments.

NetEase Smart Enterprise Tech+
NetEase Smart Enterprise Tech+
NetEase Smart Enterprise Tech+
How NetEase CloudSign Built a Real-Time Audio/Video Engine with WebRTC

In recent years, real‑time audio‑video has become increasingly popular, and the pandemic further accelerated its adoption. NetEase CloudSign, a product of NetEase Zhiji, shares its step‑by‑step experience of building a real‑time audio‑video engine, using cooking as a metaphor.

Like many industry solutions, NetEase CloudSign’s engine is built on WebRTC. The article first introduces what WebRTC provides, then walks through engineering, productization, and optimization practices, presenting the complete engine construction process.

What is WebRTC?

WebRTC (Web Real‑Time Communication) is an API suite originally designed for browsers to enable real‑time audio and video. It defines a unified interaction protocol (SDP) and core technologies for video conferencing, including audio engine, video engine, and transport control, and supports cross‑platform deployment on Windows, macOS, Linux, Android, and iOS.

Although WebRTC was initially browser‑only, Google open‑sourced it in 2011, allowing native developers to adopt its codebase widely.

Having WebRTC is like having kitchen tools and ingredients; the real‑time engine is the finished dish for the customer.

After mastering the tools (WebRTC source engineering), the next step is to combine them into a usable engine. NetEase CloudSign uses SDP signaling to exchange SDP information via a signaling server, then sets the SDP on PeerConnection to establish connections. Core media functions include publish, subscribe, and subscription response, following a specific workflow illustrated below.

Example: publishing audio‑video sends local SDP to the media server (SFU), which returns remote SDP, enabling a complete connection. Subscribing follows a similar SDP exchange.

Beyond basic connectivity, NetEase CloudSign applied several optimizations:

Optimization 1: Simulcast

Simulcast provides multiple video resolutions in a single stream, allowing subscribers to choose the appropriate quality. In a conference scenario, without Simulcast, sending 720p video requires high bandwidth for both sender and receiver. With Simulcast, the sender can transmit a low‑resolution stream (e.g., 180p) while the receiver can request a high‑resolution stream when needed, reducing bandwidth and CPU usage to about one‑quarter.

Optimization 2: Hardware Video Codec

Hardware encoding/decoding consumes less power than software solutions, especially for high‑resolution video (e.g., 1080p). WebRTC’s native hardware codec support is incomplete, so NetEase CloudSign implemented a full pipeline, handling Android fragmentation via whitelist and iOS crash mitigation through version‑specific fallbacks, plus a fallback mechanism for occasional failures.

Optimization 3: Audio Profile

Audio requirements differ between voice chat (low bitrate) and entertainment (high fidelity). NetEase CloudSign introduced adaptive audio profiles, separating speech and music codecs, adjusting sample rates and channel counts, adding resampling support, and splitting playback/recording threads to reduce hardware demands.

Optimization 4: Transmission Strategy

Transmission must balance real‑time latency, clarity, and smoothness. Improving one aspect often degrades another. NetEase CloudSign defines multiple strategies tailored to specific scenarios—e.g., low‑latency voice calls versus high‑quality live streaming—similar to offering dishes with different flavors to suit varied tastes.

These practices provide a reference for developers interested in building robust real‑time audio‑video solutions.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

optimizationaudio videoreal-time communicationWebRTCSimulcasthardware codec
NetEase Smart Enterprise Tech+
Written by

NetEase Smart Enterprise Tech+

Get cutting-edge insights from NetEase's CTO, access the most valuable tech knowledge, and learn NetEase's latest best practices. NetEase Smart Enterprise Tech+ helps you grow from a thinker into a tech expert.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.