Frontend Development 16 min read

Building a WebRTC Video Call System: Signaling, Direct Connection, and Selective Forwarding

The article explains how to build a WebRTC video‑call system using standard APIs, detailing signaling via SDP exchange, direct peer connections, the transition to a selective‑forwarding server that forwards streams efficiently, and employing a data channel for RPC‑based room and stream management across web, Android, iOS, and Windows clients.

Bilibili Tech

Feb 7, 2025

Building a WebRTC Video Call System: Signaling, Direct Connection, and Selective Forwarding

Introduction

This article is the first part of a series that explains how to build a video‑call (video‑link) system based on WebRTC from three perspectives: client, server, and audio‑video encoding optimization. It aims to help developers understand the core WebRTC technologies and practical applications.

Background

In a previous article about Bilibili’s real‑time audio‑video technology, the company described a video‑link system that prefers UDP, falls back to TCP when necessary, uses forward and backward error correction, and dynamically adjusts bitrate and sending rate according to network conditions. However, that system combined only parts of WebRTC, leading to high maintenance costs and poor compatibility with high‑level browser APIs. Therefore, the system is being refactored to use the standard WebRTC API.

Signaling and Direct Connection

WebRTC handshaking is performed through “signaling exchange”. Two broadcasters (peers) must exchange information about codecs, transport protocols, IP addresses, and ports so that each side knows how to send and receive media. The exchanged information is represented as an SDP (Session Description Protocol) string. One side creates an Offer SDP, the other side replies with an Answer SDP, and the negotiation process synchronizes the media parameters.

Example SDP details: IP 10.0.0.2, port 17723, SRTP transport, H.264 video codec, OPUS audio codec, and SSRC identifiers.

Below is a pseudo‑code illustration of how two users establish a WebRTC connection through a signaling server:

用户A {
    pc = 创建RTCPeerConnection对象
    给pc添加视频收发器（Transceiver）用于发送或接收
    给pc添加音频收发器（Transceiver）用于发送或接收
    offer = await pc.CreateOffer() // offer里包含了IP地址、端口和收发器能使用的协议、编码等信息
    await pc.SetLocalDescription(offer)
    等待IP地址、端口等信息（即：Candidate）获取完成
    offer = pc.GetLocalDescription()
    通过服务器中转将offer发送给B
}
用户B {
    offer = 收到Offer
    pc = 创建RTCPeerConnection对象
    监听pc的创建新收发器的事件
    await pc.SetRemoteDescription(offer)
    answer = await pc.CreateAnswer()
    await pc.SetLocalDescription(answer)
    等待IP地址、端口等信息获取完成
    answer = pc.GetLocalDescrption()
    通过服务器中转将answer发送给A
    pc.等待连接成功的事件
}
用户A {
    answer := // 收到answer
    pc.SetRemoteDescription(answer)
    pc.等待连接成功的事件
}

After the connection succeeds, media transceivers can be used to send and receive audio/video streams.

Selective Forwarding Server

When many participants join a live stream, direct peer‑to‑peer connections become inefficient because each broadcaster would need to send multiple copies of the same media. A selective forwarding server (SFU) solves this by receiving a single stream from each participant and forwarding it only to the participants that request it. The server also runs a WebRTC module, so the signaling process with the server is identical to that between two browsers.

Signaling State

After switching from direct peer connections to an SFU, a single RTCPeerConnection may contain multiple transceivers for different participants. The signaling state changes in a defined order:

stable → have‑local‑offer → have‑remote‑offer → stable

. These states can be observed via the signalingstatechange event and the signalingState property. The negotiationneeded event indicates when a new SDP exchange is required, for example after adding or removing a transceiver.

Data Channel

WebRTC also provides a data channel for transmitting arbitrary non‑media data. It can be used to exchange SDP strings or any custom protocol messages, eliminating the need for a separate signaling transport after the initial connection is established.

Business Actions

In a production system, remote procedure calls (RPC) are sent over the data channel to perform actions such as joining a room, publishing or subscribing to streams, and managing the session. These messages are typically serialized with protobuf, MessagePack, JSON, etc.

Summary

The second‑generation video‑link system at Bilibili uses the standard WebRTC API. It obtains server information, creates a single data channel for SDP negotiation, and connects only to the server. Media is forwarded selectively by the server, and all business logic (room management, stream control) is carried over the data channel. This architecture works uniformly across web, Android, iOS, and Windows clients.

Preview

The next article will detail how the selective forwarding server accepts these connections, handles publish/subscribe requests, performs data forwarding, recording, and provides backend RPC interfaces.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

frontend development WebRTC data channel SDP Selective Forwarding signaling Video Call

Written by

Bilibili Tech

Provides introductions and tutorials on Bilibili-related technologies.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.