Fundamentals 11 min read

Boosting Audio Quality in Weak Networks: RTC QoS Techniques Explained

This article introduces real‑time communication (RTC) and quality‑of‑service (QoS) concepts, then details audio QoS methods such as bandwidth estimation, DTX, FEC, RED, RTX/NACK, acceleration, deceleration, and PLC, while highlighting NetEase Cloud Communication's optimizations and practical considerations.

NetEase Smart Enterprise Tech+

Apr 21, 2023

Boosting Audio Quality in Weak Networks: RTC QoS Techniques Explained

Definitions of RTC, QoS, WebRTC

RTC (Real‑time Communications) refers to technologies that enable the live transmission of audio, video, text, images, and other media or non‑media data. QoS (Quality of Service) is a set of network mechanisms that improve service reliability, reduce latency, and handle congestion, enhancing user experience under challenging network conditions. WebRTC, initiated by Google, is an open‑source solution that supports audio‑video communication across web, Android, and iOS platforms.

Audio QoS Techniques

Key audio QoS technologies include bandwidth estimation and codec rate control, DTX (Discontinuous Transmission), FEC (Forward Error Correction), RED (Redundancy), RTX/NACK (Retransmission/Negative Acknowledgement), acceleration/deceleration, and PLC (Packet Loss Concealment). These techniques balance bandwidth, latency, and quality to maintain robust audio streams.

Bandwidth estimation and codec control: Higher estimated bandwidth and codec rate yield better audio quality within network limits.

DTX: When input volume is low, the encoder sends silence indication frames, saving bitrate during silent periods.

FEC: Adds redundant data to recover lost packets; higher redundancy improves recovery but consumes more bandwidth.

RED: Sends previous audio packets as redundancy; more layers increase bandwidth usage but can reduce recovery delay in low‑loss scenarios.

RTX/NACK: Requests retransmission of lost packets; effective in high‑loss conditions but adds latency and bandwidth overhead.

Acceleration: Reduces buffer size at the receiver to lower latency, producing an “accelerated” feeling.

Deceleration: Increases buffer size when under‑buffered, raising latency and creating a “decelerated” feeling.

PLC: Simulates lost packets using previously received data, which may slightly degrade audio quality.

QoS Segmentation Strategy

RTC systems involve three segments: sender, server, and receiver. NetEase Cloud Communication implements an SFU architecture, applying separate QoS policies for upstream and downstream segments, which isolates network conditions but increases design and debugging complexity.

Bandwidth Estimation

Upstream bandwidth is estimated using the GCC (Google Congestion Control) algorithm based on receiver feedback, allocating the result among codec, RED, FEC, RTX, and padding bitrates. Servers also estimate downstream bandwidth per session, aggregating multiple streams to enforce bitrate caps and prioritize VIP users, improving overall utilization. Future directions include incorporating bitrate feedback and machine‑learning‑based estimation.

DTX Encoding

The OPUS codec’s DTX feature reduces bitrate during silence by sending a DTX packet every 400 ms. NetEase optimizes this by preserving low‑bitrate silence transmission while maintaining stream continuity.

FEC

FEC can be intra‑band (within the same stream) or extra‑band (separate packets). Audio may use OPUS intra‑band FEC or extra‑band FEC. Intra‑band FEC consumes codec bitrate and may affect quality, whereas extra‑band FEC adds bandwidth but preserves audio fidelity. Common FEC schemes include XOR and Reed‑Solomon, with XOR being computationally lighter but less robust. Mask types (RandMask, BurstMask) determine protection patterns for random versus burst losses.

RED

RED embeds previous packets as redundancy within a single packet. More redundancy improves recovery speed but increases bandwidth. Continuous sequence numbering is typical; non‑sequential numbering offers no clear advantage and can increase latency.

RTX

RTX introduces the highest latency among weak‑network resilience methods because retransmission requires at least one RTT. Retransmission cost depends on RTT, loss rate, request timing, and response strategy. Proper tuning can significantly improve recovery effectiveness while managing bandwidth consumption.

Conclusion

Optimizing audio QoS is often more cost‑effective than video because audio consumes less bandwidth. However, controlling audio bandwidth remains crucial for profitability in RTC services. The ultimate QoS goal is to enhance weak‑network resilience and reduce recovery delay within limited bandwidth, leaving ample research opportunities for QoS professionals.

WebRTC RTC FEC DTX Audio QoS bandwidth estimation RTX

Written by

NetEase Smart Enterprise Tech+

Get cutting-edge insights from NetEase's CTO, access the most valuable tech knowledge, and learn NetEase's latest best practices. NetEase Smart Enterprise Tech+ helps you grow from a thinker into a tech expert.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.