Building a WebRTC Video Call System: Signaling, Direct Connection, and Selective Forwarding
The article explains how to build a WebRTC video‑call system using standard APIs, detailing signaling via SDP exchange, direct peer connections, the transition to a selective‑forwarding server that forwards streams efficiently, and employing a data channel for RPC‑based room and stream management across web, Android, iOS, and Windows clients.
Introduction
This article is the first part of a series that explains how to build a video‑call (video‑link) system based on WebRTC from three perspectives: client, server, and audio‑video encoding optimization. It aims to help developers understand the core WebRTC technologies and practical applications.
Background
In a previous article about Bilibili’s real‑time audio‑video technology, the company described a video‑link system that prefers UDP, falls back to TCP when necessary, uses forward and backward error correction, and dynamically adjusts bitrate and sending rate according to network conditions. However, that system combined only parts of WebRTC, leading to high maintenance costs and poor compatibility with high‑level browser APIs. Therefore, the system is being refactored to use the standard WebRTC API.
Signaling and Direct Connection
WebRTC handshaking is performed through “signaling exchange”. Two broadcasters (peers) must exchange information about codecs, transport protocols, IP addresses, and ports so that each side knows how to send and receive media. The exchanged information is represented as an SDP (Session Description Protocol) string. One side creates an Offer SDP, the other side replies with an Answer SDP, and the negotiation process synchronizes the media parameters.
Example SDP details: IP 10.0.0.2, port 17723, SRTP transport, H.264 video codec, OPUS audio codec, and SSRC identifiers.
Below is a pseudo‑code illustration of how two users establish a WebRTC connection through a signaling server:
用户A {
pc = 创建RTCPeerConnection对象
给pc添加视频收发器(Transceiver)用于发送或接收
给pc添加音频收发器(Transceiver)用于发送或接收
offer = await pc.CreateOffer() // offer里包含了IP地址、端口和收发器能使用的协议、编码等信息
await pc.SetLocalDescription(offer)
等待IP地址、端口等信息(即:Candidate)获取完成
offer = pc.GetLocalDescription()
通过服务器中转将offer发送给B
}
用户B {
offer = 收到Offer
pc = 创建RTCPeerConnection对象
监听pc的创建新收发器的事件
await pc.SetRemoteDescription(offer)
answer = await pc.CreateAnswer()
await pc.SetLocalDescription(answer)
等待IP地址、端口等信息获取完成
answer = pc.GetLocalDescrption()
通过服务器中转将answer发送给A
pc.等待连接成功的事件
}
用户A {
answer := // 收到answer
pc.SetRemoteDescription(answer)
pc.等待连接成功的事件
}After the connection succeeds, media transceivers can be used to send and receive audio/video streams.
Selective Forwarding Server
When many participants join a live stream, direct peer‑to‑peer connections become inefficient because each broadcaster would need to send multiple copies of the same media. A selective forwarding server (SFU) solves this by receiving a single stream from each participant and forwarding it only to the participants that request it. The server also runs a WebRTC module, so the signaling process with the server is identical to that between two browsers.
Signaling State
After switching from direct peer connections to an SFU, a single RTCPeerConnection may contain multiple transceivers for different participants. The signaling state changes in a defined order: stable → have‑local‑offer → have‑remote‑offer → stable . These states can be observed via the signalingstatechange event and the signalingState property. The negotiationneeded event indicates when a new SDP exchange is required, for example after adding or removing a transceiver.
Data Channel
WebRTC also provides a data channel for transmitting arbitrary non‑media data. It can be used to exchange SDP strings or any custom protocol messages, eliminating the need for a separate signaling transport after the initial connection is established.
Business Actions
In a production system, remote procedure calls (RPC) are sent over the data channel to perform actions such as joining a room, publishing or subscribing to streams, and managing the session. These messages are typically serialized with protobuf, MessagePack, JSON, etc.
Summary
The second‑generation video‑link system at Bilibili uses the standard WebRTC API. It obtains server information, creates a single data channel for SDP negotiation, and connects only to the server. Media is forwarded selectively by the server, and all business logic (room management, stream control) is carried over the data channel. This architecture works uniformly across web, Android, iOS, and Windows clients.
Preview
The next article will detail how the selective forwarding server accepts these connections, handles publish/subscribe requests, performs data forwarding, recording, and provides backend RPC interfaces.
Bilibili Tech
Provides introductions and tutorials on Bilibili-related technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.