Decoupling Audio‑Video Algorithms: AVProcessEngine Reduces RTC SDK Size & Improves Performance
The article explains how NetEase Cloud Communication’s AVProcessEngine framework separates audio‑video algorithms from the NERTC SDK, addressing SDK bloat and performance drops on low‑end devices by using plugin‑based processing, dynamic algorithm adjustment, and unified interfaces.
Background
RTC technology is widely used in video conferencing, live streaming, telemedicine, etc., with varying audio‑video feature requirements such as beauty filters for entertainment and virtual backgrounds for meetings. Providing a full‑feature RTC SDK leads to redundant functions and large package sizes, especially problematic for low‑end devices where complex features degrade frame rate, latency, and smoothness.
AVProcessEngine Audio‑Video Engine
The overall RTC audio‑video processing flow includes capture, pre‑processing (beauty, virtual background, AI denoise), encoding, network transmission, decoding, post‑processing (super‑resolution, enhancement), and playback. Previously, the NERTC SDK bundled all modules, causing unnecessary size and performance overhead.
AVProcessEngine decouples the NERTC SDK from audio‑video algorithms: the SDK retains basic call functionality while algorithm libraries are provided as separate plugins that can be integrated on demand.
After integrating AVProcessEngine, the NERTC SDK architecture was adjusted so that other modules no longer call audio‑video algorithms directly; all audio‑video data operations go through AVProcessEngine, and algorithms exist as a collection of plugin libraries.
Audio‑Video Plugin Loading Module
Audio‑video algorithms are implemented as plugins following an OpenMAX‑like interface. Each plugin inherits from a base OMXComponent class that defines common interfaces for parameter setting, parameter getting, data processing, and state querying.
setParameter: configure algorithm parameters (e.g., beauty level).
getParameter: retrieve current parameter values.
processVideoFrame: process raw video frames and return the enhanced frame.
getState: obtain the plugin’s current state.
Plugins expose CreatePlugin and DestroyPlugin functions for dynamic loading. AVProcessEngine loads plugins by locating these function pointers in dynamic memory and invoking them; failure to load results in fallback to raw frames with error logging.
Audio‑Video Algorithm Runtime Management Module
This module dynamically adjusts algorithm settings at runtime to maintain RTC performance on varying devices. Algorithms are categorized into performance‑oriented and quality‑oriented tiers. The engine monitors processing times (e.g., total pre‑processing time, individual algorithm times) and switches tiers based on thresholds (e.g., 33 ms for 30 FPS). If total time exceeds the threshold, the engine downgrades the most time‑consuming algorithms; if below, it may upgrade less costly algorithms. Cross‑module adjustments (e.g., when encoding becomes the bottleneck) are also handled by dropping frames or reducing unnecessary processing.
Logging Module
The logging module records key events such as plugin load success or failure, providing diagnostic information for troubleshooting.
Conclusion
AVProcessEngine solves SDK bloat and performance degradation by plugin‑based algorithm decoupling and dynamic adjustment, improving latency, jitter, and smoothness on low‑end devices.
Future Outlook
Future work includes collecting additional device metrics (CPU, GPU usage) for more informed algorithm tuning and refining algorithm tiers for smoother transitions.
NetEase Smart Enterprise Tech+
Get cutting-edge insights from NetEase's CTO, access the most valuable tech knowledge, and learn NetEase's latest best practices. NetEase Smart Enterprise Tech+ helps you grow from a thinker into a tech expert.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
