Real-Time Voice Communication Technologies and AI Enhancements in Tencent Meeting
Shang Shidong outlines Tencent Meeting’s shift from analog PSTN to IP‑based VoIP, using H.323, SIP, RTP/UDP and the Opus codec, while AI‑driven super‑resolution, deep‑learning packet‑loss concealment, advanced noise reduction, and speech‑music classification boost audio quality, complemented by reference‑free MOS assessment and future 5G‑enabled cloud, IoT and WebRTC integration.
This article summarizes a technical talk by Shang Shidong, senior director of Tencent Multimedia Lab, covering the evolution of communication systems, from analog PSTN to digital and IP‑based telephony, and the protocols that enable modern real‑time voice services.
It describes the transition from circuit‑switched PSTN to packet‑switched IP networks, highlighting the role of ISDN, H.323, and SIP in establishing and maintaining calls. The advantages of packet switching—shared bandwidth and reduced cost—are contrasted with the challenges it introduces, such as packet loss, latency, jitter, echo, and bandwidth constraints.
The talk then details the specific VoIP challenges faced by Tencent Meeting, including packet loss, delay, jitter, echo, bandwidth limitations, and the complications of multiple devices joining the same room.
To address these issues, Tencent Meeting employs a layered audio solution: H.323 for PSTN interworking, SIP for Internet interconnection, RTP over UDP for media transport, and the Opus codec for low‑latency, high‑quality speech and music.
Artificial‑intelligence techniques are applied to improve audio quality: bandwidth expansion from narrow‑band to wide‑band using AI‑driven super‑resolution, packet‑loss concealment via deep‑learning models, advanced noise reduction that targets non‑stationary sounds, and a speech‑music classifier to handle mixed audio scenarios.
The system also incorporates a comprehensive, reference‑free audio quality assessment framework based on ITU and 3GPP standards, providing objective MOS scores and QoE metrics without needing a clean reference signal.
Finally, the presentation looks ahead to future trends, such as 5G’s impact on bandwidth and latency, the growing importance of cloud‑based communication, and the integration of AI‑driven assistants, IoT devices, and WebRTC into the broader VoIP ecosystem.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.