What Is WebRTC? Overview, Architecture, Signaling, and Demo Implementation
This article explains WebRTC as a cross‑platform, low‑latency real‑time communication technology, covering its definition, three‑layer architecture, JavaScript APIs, signaling process, NAT traversal mechanisms, a complete demo code example, and a practical Douyin business use case.
What Is WebRTC
Cross‑platform, low‑latency, end‑to‑end audio‑video real‑time communication technology
Overview
WebRTC (Real‑Time Communications) generally refers to audio‑video real‑time communication, but the broader RTC concept also includes IM, images, whiteboard, file sharing and other rich media interactions. It is both an API and a set of protocols.
Typical scenarios include P2P video calls, conference calls, live streaming, remote access, online education, tele‑medicine, IoT devices (drones, cameras, smart speakers) and more.
Any endpoint—browser, desktop app, Android/iOS device or IoT—can interoperate as long as it follows the WebRTC specifications and has IP connectivity.
The goal of WebRTC is to let web developers create rich real‑time multimedia applications in the browser with simple JavaScript, without installing plugins or handling low‑level signal processing.
WebRTC Principles
Three‑Layer Architecture
Your Web App Layer
Implements the real‑time communication application.
Web API Layer
This layer exposes the WebRTC JavaScript APIs to developers. The main APIs are:
API categories: Media Stream API , RTCPeerConnection , Peer‑to‑peer Data API
Media Stream API : Access camera and microphone via MediaStream to obtain synchronized audio‑video streams.
RTCPeerConnection : Represents a WebRTC connection between the local computer and a remote peer, providing methods to create, maintain, monitor and close the connection.
Peer‑to‑peer Data API : Creates a high‑throughput, low‑latency data channel for arbitrary data transfer.
WebRTC Core Layer (Four Layers)
WebRTC C/C++ API (PeerConnection): Implements P2P connection, audio/video capture, transmission, and non‑media data.
Session Management / Abstract Signaling: Manages sessions for audio, video and data streams.
Audio Engine, Video Engine, Transport: Core processing modules.
Hardware Adaptation Layer: Handles device‑level capture/rendering, video capture, and network I/O; these modules can be overridden for custom implementations.
WebRTC Communication
WebRTC uses RTCPeerConnection to exchange media streams between browsers. After creating an RTCPeerConnection instance, two negotiation steps are required:
Media negotiation – determine stream characteristics (resolution, codecs, etc.) via SDP.
Network negotiation – exchange ICE candidates to discover reachable network addresses.
Signaling
Before a WebRTC connection can be established, a signaling process exchanges metadata so that the two peers can locate each other. Signaling messages are plain text and can be transported via WebSockets or any other channel.
Signaling purposes include:
Control messages to open/close the connection.
Error notifications.
Media adaptation data (codec, bandwidth, etc.).
Security key exchange.
Network configuration (IP, ports).
The signaling server acts as an intermediary that helps both peers establish a connection while minimizing privacy exposure.
Session Description Protocol (SDP)
Media negotiation details are described using SDP, a key/value format similar to INI files. An SDP contains one or more media descriptions, each typically mapping to a single media stream.
SDP Handshake
Similar to TCP's three‑way handshake, WebRTC performs an offer/answer exchange that requires at least four messages: send offer, receive answer, send ICE candidates, receive ICE candidates.
NAT Traversal (Hole Punching)
To establish a direct peer‑to‑peer channel across NATs and firewalls, WebRTC uses ICE, which integrates STUN and TURN protocols.
STUN : Discovers public IP/port and NAT type.
TURN : Relays traffic through a server when direct connection fails.
ICE : Attempts STUN first, then falls back to TURN, ensuring connectivity.
WebRTC Demo
Below is a minimal demo that creates a local video element, a remote video element, and uses JavaScript to establish a peer connection.
Demo HTML
<!DOCTYPE html>
<html>
<head>
<title>WebRTC Demo</title>
<style type="text/css">
#remote{position:absolute;top:100px;left:100px;width:500px;}
#local{position:absolute;top:120px;left:480px;width:100px;z-index:9999;border:1px solid #ddd;}
</style>
</head>
<body>
<video id="local" autoplay></video>
<video id="remote" autoplay></video>
<script type="text/javascript" src="./main.js"></script>
</body>
</html>Demo JavaScript (main.js)
function hasUserMedia(){
navigator.getUserMedia = navigator.getUserMedia || navigator.msGetUserMedia || navigator.webkitGetUserMedia || navigator.mozGetUserMedia;
return !!navigator.getUserMedia;
}
function hasRTCPeerConnection(){
window.RTCPeerConnection = window.RTCPeerConnection || window.webkitRTCPeerConnection || window.mozRTCPeerConnection || window.msRTCPeerConnection;
return !!window.RTCPeerConnection;
}
function startPeerConnection(stream){
let config = {'iceServers':[{'url':'stun:stun.services.mozilla.com'},{'url':'stun:stunserver.org'},{'url':'stun:stun.l.google.com:19302'}]};
const localConnection = new RTCPeerConnection(config);
const remoteConnection = new RTCPeerConnection(config);
localConnection.onicecandidate = e=>{ if(e.candidate) remoteConnection.addIceCandidate(new RTCIceCandidate(e.candidate)); };
remoteConnection.onicecandidate = e=>{ if(e.candidate) localConnection.addIceCandidate(new RTCIceCandidate(e.candidate)); };
remoteConnection.onaddstream = e=>{ remoteVideo.srcObject = e.stream; };
localConnection.addStream(stream);
localConnection.createOffer().then(offer=>{
localConnection.setLocalDescription(offer);
remoteConnection.setRemoteDescription(offer);
remoteConnection.createAnswer().then(answer=>{
remoteConnection.setLocalDescription(answer);
localConnection.setRemoteDescription(answer);
});
});
}
function main(){
let localVideo = document.getElementById("local");
let remoteVideo = document.getElementById("remote");
if(hasUserMedia()){
navigator.getUserMedia({video:true,audio:false}, stream=>{
localVideo.srcObject = stream;
if(hasRTCPeerConnection()){
startPeerConnection(stream);
} else { alert("No RTCPeerConnection API"); }
}, err=>{ console.log(err); });
} else { alert("No getUserMedia API"); }
}Business Scenario: Douyin "Xiao An" Human‑initiated Call
Douyin’s security assistant uses a WebRTC SDK provided by Volcano Engine. The flow is:
Create and initialize the client engine via createEngine .
Join an RTC room with engine.joinRoom , configuring auto‑publish/subscribe.
Capture local audio/video with startAudioCapture / startVideoCapture , publish and play locally.
Subscribe to and play remote streams.
Leave the room with leaveRoom when the call ends.
Key parameters for establishing the connection are appId , token , roomId and uid .
SDK Call Flow (Code Snippet)
// Step 1: Fetch RTC init config
const createVoip = async () => {
const data = await InterveneServer.createVoip({
ToUserId: callParams?.ToUserId,
VoipType: 1,
BizScene: callParams?.BizScene,
Desc: callParams?.Desc,
Command: callParams?.Command
});
setToken(data?.Token);
config.current = {appId: data?.AppId, roomId: data?.RoomId, uid: data?.MyAppUserId, voipUUid: data?.VoipUUid};
setJoin(true);
setCallStatus(CallStatus.WAITING);
};
// Step 2: Create RTC instance
export default class RtcComponent extends React.Component {
rtc = new RtcClient(this.props);
componentDidMount(){ this.props.onRef(this.rtc); }
render(){ return <> ; }
}
// Step 3: Init RTC, bind events, join room
const initRTC = async () => {
const {roomId, uid} = config.current || {};
if(!roomId || !uid || !rtc.current) return;
rtc.current.bindEngineEvents();
await rtc.current.join(token, roomId, uid);
await rtc.current.createLocalStream(res => {
const {code, devicesStatus} = res;
if(code === ERROR_CODE || devicesStatus.audio === FAILED){ setMicOn(false); return; }
});
};
// Step 4: Handle remote stream addition
const handleStreamAdd = useCallback(event => {
const stream = event.stream;
const userId = stream.userId;
if(count.current < 3 && !remoteStreams[userId]){
remoteStreams[userId] = stream;
stream.playerComp = (
);
setRemoteStreams({...remoteStreams});
count.current += 1;
}
}, [remoteStreams]);Appendix
WebRTC vs RTMP
WebRTC vs WebSocket
TikTok Frontend Technology Team
We are the TikTok Frontend Technology Team, serving TikTok and multiple ByteDance product lines, focused on building frontend infrastructure and exploring community technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.