How RTMPGateway Bridges WeChat Mini‑Programs with WebRTC: A Deep Dive

This article explains how the RTMPGateway media server enables audio‑video communication between WeChat mini‑programs and other platforms by converting RTMP streams to RTP, handling handshakes, media encapsulation, audio transcoding, and signaling, while addressing performance and synchronization challenges.

NetEase Smart Enterprise Tech+
NetEase Smart Enterprise Tech+
NetEase Smart Enterprise Tech+
How RTMPGateway Bridges WeChat Mini‑Programs with WebRTC: A Deep Dive

As the internetization of various industries advances, integrated communication is being applied in many scenarios such as video signing in finance, remote diagnosis in healthcare, and multi‑person video meetings in enterprises.

Implementing audio‑video intercommunication through WeChat mini‑programs reduces user communication costs, improves operational efficiency, and helps organizations overcome communication barriers within the WeChat ecosystem.

RTMPGateway

WeChat version 6.5.21 opened real‑time audio‑video capabilities for mini‑programs. Developers can use the <live-pusher> component for RTMP‑based publishing (recording) and the <live-player> component for RTMP‑based playback. The new RTC mode enables real‑time audio‑video uplink and downlink.

WeChat mini‑programs only provide client‑side capabilities; a media server is required to achieve inter‑mini‑program calls and cross‑platform interoperability.

RTMPGateway Architecture

Mini‑programs connect to the edge media gateway (RTMPGateway) via RTMP.

RTMPGateway forwards RTMP streams between mini‑programs.

RTMPGateway converts RTMP to RTP and forwards it to the NetEase Cloud‑sign edge media server, enabling interoperability with Cloud‑sign SDK and standard WebRTC endpoints.

RTMP Connection Details

RTMP (Real‑Time Messaging Protocol) is an Adobe‑originated TCP‑based application‑layer protocol widely used in live streaming.

Handshake process:

Client sends C0 and C1; server replies with S0 and S1.

Client sends C2 after receiving S0/S1; server sends S2 after receiving C0/C1.

Handshake completes when both sides have received the counterpart’s C2/S2.

NetConnection establishment involves a series of command messages (connect, window‑size, bandwidth, stream begin, _result, etc.).

CreateStream follows a similar request‑response pattern.

Asynchronous RTMP Stack

To handle many connections efficiently, RTMPGateway implements an asynchronous RTMP stack using a state‑machine in a single thread. Each connection’s state is tracked to manage the entire lifecycle.

Media Protocol Encapsulation

RTMPGateway must convert between RTMP and RTP to interoperate with Cloud‑sign SDK and WebRTC terminals.

RTMP Encapsulation of AAC

WeChat mini‑programs use AAC for audio. An AAC sequence header (AudioSpecificConfig) must be sent before audio data; it is preceded by a one‑byte AudioTag indicating the packet type.

RTMP Encapsulation of H.264

Video uses H.264. An AVC sequence header (AVCDecoderConfigurationRecord) is required before video data; it is also preceded by a one‑byte VideoTag.

RTP Protocol

RTP provides end‑to‑end real‑time transport for audio/video, with optional RTCP for quality control. RTP packets consist of a fixed header and optional extensions.

Key RTP extensions are defined in RFCs such as 3550, 3551, 3711, 4585, and 5124.

RTP Encapsulation of H.264

WebRTC adopts RFC 3984 non‑interleaved packetization. Three packetization modes are used:

Single NAL Unit Packet : basic RTP packet containing a single NAL unit.

STAP‑A : aggregates multiple NAL units (e.g., SPS and PPS) into one RTP packet.

FU‑A : fragments a large NAL unit across several RTP packets, with start (S), end (E), and reserved (R) bits in the FU header.

Frame Integrity Judgment

When converting RTP video packets back to RTMP, RTMPGateway assembles complete frames before re‑encapsulation. Frame completeness is verified by counting packets from the first to the last packet of a frame, not solely by the RTP marker bit.

Audio Transcoding

WeChat mini‑programs output AAC, while the Cloud‑sign edge server prefers Opus. RTMPGateway runs a dedicated audio‑transcoding thread pool to convert AAC to Opus, balancing load across threads.

Signaling Between Mini‑Program and Media Server

RTMPGateway uses WebSocket for signaling with both the mini‑program client and the media server. It creates virtual users (mini‑fakeClient, room‑fakeClient) to manage conference rooms and de‑duplicate signaling.

Additional Considerations

RTMP runs over TCP; in weak networks, TCP congestion control can cause aggressive back‑off. RTMPGateway adopts the BBR algorithm on the server side to improve bandwidth utilization.

RTP timestamp calculation can use system time before receiving SR packets, then adjust with NTP after SR, or set the first RTP packet as the start time and compute timestamps based on a fixed sampling rate.

These techniques, together with buffer management for audio and video streams, help achieve audio‑video synchronization and control latency.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

RTMPRTPMedia Serveraudio-video streamingProtocol ConversionWeChat mini-program
NetEase Smart Enterprise Tech+
Written by

NetEase Smart Enterprise Tech+

Get cutting-edge insights from NetEase's CTO, access the most valuable tech knowledge, and learn NetEase's latest best practices. NetEase Smart Enterprise Tech+ helps you grow from a thinker into a tech expert.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.