How Real-Time Audio/Video Meets Traditional PSTN: Architecture and Low‑Latency Solutions
This article provides an in‑depth technical analysis of integrating real‑time audio/video (RTC) with legacy PSTN, covering latency sources, protocol and codec differences, adaptation layers, system architecture, and optimization techniques such as jitter buffering, ARQ/FEC, and automatic failover.
Overview of Real‑Time Audio/Video (RTC)
Real‑time audio/video (RTC) enables sub‑second communication; latency is classified as pseudo‑real‑time (>3 s), near‑real‑time (1‑3 s) and true real‑time (<1 s). Tencent Cloud RTC achieves audio latency below 300 ms.
Sources of Latency
Voice calls involve a signaling layer (call setup, resource negotiation) and a media layer (audio capture, preprocessing, encoding, transport, decoding, playback). Because the media travels over the public Internet (VOIP), network transmission introduces delay, especially the time from the first spoken word to the listener’s ear.
Why Low Latency Matters and What Must Be Solved
Internet‑based UDP transport is chosen for its speed, but UDP’s unreliability brings jitter and packet loss. To maintain quality, the system must provide jitter mitigation, packet‑loss concealment, adaptive bitrate control, and audio‑enhancement functions such as automatic gain control (AGC), acoustic echo cancellation (AEC) and active noise cancellation (ANC).
Traditional PSTN Overview
PSTN (Public Switched Telephone Network) is a circuit‑switched system originating from the 1876 Bell telephone. Calls occupy a dedicated physical line, resulting in low utilization. Modern PSTN can be accessed via SIP‑TRUNK, transporting signaling with SIP and media with RTP over IP.
Motivation for RTC‑PSTN Fusion
Scenarios such as offline participants joining a QQ group voice call, multi‑party meetings, smart door access systems, and click‑to‑call on web pages require bridging VOIP and PSTN so that users can be reached via traditional telephone numbers.
Key Differences Between RTC and PSTN
Protocol: RTC uses a proprietary QQ protocol; PSTN uses SIP + RTP.
Codecs: RTC supports SILK, AAC, OPUS; PSTN (via SIP‑TRUNK) supports G.711A/U, G.729.
Sample rates: RTC 16 kHz/48 kHz, PSTN 8 kHz.
Packet interval: RTC can send 20‑60 ms packets; PSTN typically 20 ms.
Mixing: RTC supports server‑side mixing for multi‑party calls; PSTN lacks client‑side mixing capability.
Integration Approach: Adaptation Layer
The solution introduces an intermediate adaptation layer that performs two functions: signaling adaptation (converting between QQ private signaling and SIP) and media adaptation (transcoding codecs, resampling, and packet format conversion). This layer resides in the Internet and bridges VOIP and PSTN without requiring changes to either endpoint.
System Architecture
The OpenSDK exposed by Tencent Cloud provides the same core as QQ’s audio/video engine, stripped of QQ‑specific business logic, and supports Android, iOS, Windows, and Web. Call flow: client signaling → signaling processing module → flow‑control → signaling adaptation → media adaptation (mixing, codec conversion) → PSTN gateway (SIP server + RTP media server). Server‑side mixing combines multiple VOIP streams into a single RTP stream for delivery to PSTN or mobile clients.
Optimization Techniques
Voice enhancement on the client side is limited; most processing occurs on the server: AGC, AEC, ANC, jitter buffer, packet‑loss concealment (PLC), voice activity detection (VAD) with EOS packets, and adaptive bitrate based on 2‑second network statistics. Loss recovery uses ARQ (automatic repeat request) and FEC (forward error correction); ARQ adds latency, while FEC adds redundant packets dynamically according to network quality.
Reliability and Availability
Large‑scale deployments rely on redundancy (multiple ISPs, data centers) and automatic failover. If a server crashes or an IDC becomes unavailable, traffic is rerouted to healthy resources without service interruption.
Q&A Highlights
Operators must hold appropriate SIP and telecom licenses; integrating with carriers may require SP qualifications and a call‑center permit. In extremely poor network environments (e.g., on ships or vehicles), the current solution has limited mitigation options.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
