Artificial Intelligence 30 min read

Real-Time Voice Communication Technologies and AI Enhancements in Tencent Meeting

Shang Shidong outlines Tencent Meeting’s shift from analog PSTN to IP‑based VoIP, using H.323, SIP, RTP/UDP and the Opus codec, while AI‑driven super‑resolution, deep‑learning packet‑loss concealment, advanced noise reduction, and speech‑music classification boost audio quality, complemented by reference‑free MOS assessment and future 5G‑enabled cloud, IoT and WebRTC integration.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
Real-Time Voice Communication Technologies and AI Enhancements in Tencent Meeting

This article summarizes a technical talk by Shang Shidong, senior director of Tencent Multimedia Lab, covering the evolution of communication systems, from analog PSTN to digital and IP‑based telephony, and the protocols that enable modern real‑time voice services.

It describes the transition from circuit‑switched PSTN to packet‑switched IP networks, highlighting the role of ISDN, H.323, and SIP in establishing and maintaining calls. The advantages of packet switching—shared bandwidth and reduced cost—are contrasted with the challenges it introduces, such as packet loss, latency, jitter, echo, and bandwidth constraints.

The talk then details the specific VoIP challenges faced by Tencent Meeting, including packet loss, delay, jitter, echo, bandwidth limitations, and the complications of multiple devices joining the same room.

To address these issues, Tencent Meeting employs a layered audio solution: H.323 for PSTN interworking, SIP for Internet interconnection, RTP over UDP for media transport, and the Opus codec for low‑latency, high‑quality speech and music.

Artificial‑intelligence techniques are applied to improve audio quality: bandwidth expansion from narrow‑band to wide‑band using AI‑driven super‑resolution, packet‑loss concealment via deep‑learning models, advanced noise reduction that targets non‑stationary sounds, and a speech‑music classifier to handle mixed audio scenarios.

The system also incorporates a comprehensive, reference‑free audio quality assessment framework based on ITU and 3GPP standards, providing objective MOS scores and QoE metrics without needing a clean reference signal.

Finally, the presentation looks ahead to future trends, such as 5G’s impact on bandwidth and latency, the growing importance of cloud‑based communication, and the integration of AI‑driven assistants, IoT devices, and WebRTC into the broader VoIP ecosystem.

AIAudio ProcessingRTPTencent MeetingReal-time Audiospeech enhancementVoIP
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.