Industry Insights 17 min read

How Real-Time Audio/Video Meets Traditional PSTN: Architecture and Low‑Latency Solutions

This article provides an in‑depth technical analysis of integrating real‑time audio/video (RTC) with legacy PSTN, covering latency sources, protocol and codec differences, adaptation layers, system architecture, and optimization techniques such as jitter buffering, ARQ/FEC, and automatic failover.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
How Real-Time Audio/Video Meets Traditional PSTN: Architecture and Low‑Latency Solutions

Overview of Real‑Time Audio/Video (RTC)

Real‑time audio/video (RTC) enables sub‑second communication; latency is classified as pseudo‑real‑time (>3 s), near‑real‑time (1‑3 s) and true real‑time (<1 s). Tencent Cloud RTC achieves audio latency below 300 ms.

Sources of Latency

Voice calls involve a signaling layer (call setup, resource negotiation) and a media layer (audio capture, preprocessing, encoding, transport, decoding, playback). Because the media travels over the public Internet (VOIP), network transmission introduces delay, especially the time from the first spoken word to the listener’s ear.

Why Low Latency Matters and What Must Be Solved

Internet‑based UDP transport is chosen for its speed, but UDP’s unreliability brings jitter and packet loss. To maintain quality, the system must provide jitter mitigation, packet‑loss concealment, adaptive bitrate control, and audio‑enhancement functions such as automatic gain control (AGC), acoustic echo cancellation (AEC) and active noise cancellation (ANC).

Traditional PSTN Overview

PSTN (Public Switched Telephone Network) is a circuit‑switched system originating from the 1876 Bell telephone. Calls occupy a dedicated physical line, resulting in low utilization. Modern PSTN can be accessed via SIP‑TRUNK, transporting signaling with SIP and media with RTP over IP.

Motivation for RTC‑PSTN Fusion

Scenarios such as offline participants joining a QQ group voice call, multi‑party meetings, smart door access systems, and click‑to‑call on web pages require bridging VOIP and PSTN so that users can be reached via traditional telephone numbers.

Key Differences Between RTC and PSTN

Protocol: RTC uses a proprietary QQ protocol; PSTN uses SIP + RTP.

Codecs: RTC supports SILK, AAC, OPUS; PSTN (via SIP‑TRUNK) supports G.711A/U, G.729.

Sample rates: RTC 16 kHz/48 kHz, PSTN 8 kHz.

Packet interval: RTC can send 20‑60 ms packets; PSTN typically 20 ms.

Mixing: RTC supports server‑side mixing for multi‑party calls; PSTN lacks client‑side mixing capability.

Integration Approach: Adaptation Layer

The solution introduces an intermediate adaptation layer that performs two functions: signaling adaptation (converting between QQ private signaling and SIP) and media adaptation (transcoding codecs, resampling, and packet format conversion). This layer resides in the Internet and bridges VOIP and PSTN without requiring changes to either endpoint.

System Architecture

The OpenSDK exposed by Tencent Cloud provides the same core as QQ’s audio/video engine, stripped of QQ‑specific business logic, and supports Android, iOS, Windows, and Web. Call flow: client signaling → signaling processing module → flow‑control → signaling adaptation → media adaptation (mixing, codec conversion) → PSTN gateway (SIP server + RTP media server). Server‑side mixing combines multiple VOIP streams into a single RTP stream for delivery to PSTN or mobile clients.

Optimization Techniques

Voice enhancement on the client side is limited; most processing occurs on the server: AGC, AEC, ANC, jitter buffer, packet‑loss concealment (PLC), voice activity detection (VAD) with EOS packets, and adaptive bitrate based on 2‑second network statistics. Loss recovery uses ARQ (automatic repeat request) and FEC (forward error correction); ARQ adds latency, while FEC adds redundant packets dynamically according to network quality.

Reliability and Availability

Large‑scale deployments rely on redundancy (multiple ISPs, data centers) and automatic failover. If a server crashes or an IDC becomes unavailable, traffic is rerouted to healthy resources without service interruption.

Q&A Highlights

Operators must hold appropriate SIP and telecom licenses; integrating with carriers may require SP qualifications and a call‑center permit. In extremely poor network environments (e.g., on ships or vehicles), the current solution has limited mitigation options.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

media streamingreal-time communicationAudio ProcessingLow latencyRTCPSTN integration
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.