Backend Development 16 min read

Understanding the WebRTC Video Capture Pipeline: From Capture Module to Encoder

This article explains how WebRTC builds the video processing pipeline by detailing the capture module, internal data flow, VideoTrack construction, rendering, and encoder integration, and it outlines the key API calls such as AddTrack, CreateOffer, and SetLocalDescription that establish the end‑to‑end video stream.

TAL Education Technology
TAL Education Technology
TAL Education Technology
Understanding the WebRTC Video Capture Pipeline: From Capture Module to Encoder

WebRTC video sessions transmit local audio/video data through three stages: capture, encoding, and transmission, forming a video pipeline that this series of articles will dissect.

2. Capture – The video capture module initiates the pipeline by acquiring raw frames from sources such as cameras, desktop screens, or video files. Platform‑specific implementations (AVFoundation on macOS/iOS, V4L2 on Linux, Camera2 on Android, DirectShow/MediaFoundation on Windows) reside under modules/video_capture , with platform‑independent code in the same directory and platform‑specific code in subfolders.

The DeviceInfo interface enumerates devices and their capabilities, while the abstract VideoCaptureModule defines common operations like StartCapture , StopCapture , CaptureStarted , RegisterCaptureDataCallback , and rotation handling.

VideoCaptureImpl implements the module, providing a static Create factory that returns a platform‑specific subclass (e.g., VideoCaptureDS on Windows, VideoCaptureV4L2 on Linux). It forwards captured frames to IncomingFrame , which rotates the frame, converts it to I420 via libyuv, timestamps it, and then calls DeliverCapturedFrame . The latter invokes VideoSinkInterface::OnFrame on the registered callback, pushing the frame downstream.

2.2 Internal Data Flow – After ProcessCapturedFrame (platform‑specific) calls VideoCaptureImpl::IncomingFrame , the frame travels through rotation, format conversion, and timestamping before reaching VideoCaptureImpl::DeliverCapturedFrame , which finally hands it to the next sink.

3. Pipeline Establishment – The capture module collaborates with higher‑level components to deliver frames to rendering or encoding modules. Initialization creates the module; termination stops and destroys it. Data flows via callbacks to the next stage.

The VideoCapture → VideoTrack path creates a VideoTrack object that aggregates a VideoSource . The VideoSource holds a VideoBroadcaster which implements VideoSinkInterface and registers downstream sinks. Adding a sink registers it in VideoBroadcaster::sinks_ , allowing broadcast to multiple consumers.

VideoTrack → VideoTrackSource composes a VideoSource and forwards sink registration, while also exposing status notifications via NotifierInterface .

VideoTrackSource → VideoTrack does not implement VideoSinkInterface , but its sink registration ultimately reaches the underlying VideoSource , completing the source‑to‑sink chain.

3.2 VideoTrack to Rendering – Rendering classes implement VideoSinkInterface and are registered with VideoTrack::AddOrUpdateSink . The WebRTC‑provided renderers or custom renderers receive frames via OnFrame and display them.

3.3 VideoTrack to Encoder – The encoder is represented by VideoStreamEncoder (found in src/video/video_stream_encoder.h ). The pipeline from VideoTrack to the encoder is established through a series of objects: VideoRtpSender , WebRtcVideoChannel , WebRtcVideoSendStream , VideoSendStream , and finally VideoStreamEncoder . Key API calls such as PeerConnection::AddTrack , CreateOffer , and SetLocalDescription trigger the creation and wiring of these objects.

AddTrack() creates a VideoRtpSender (and, in Unified‑Plan, an RtpTransceiver ) linking the VideoTrack to a VideoRtpSender . Later, VideoRtpSender::SetSsrc binds the track to a media channel.

CreateOffer() gathers local capabilities and generates SDP, assigning unique SSRCs to each track but not yet linking them to media channels.

SetLocalDescription() parses the SDP, creates media channels via WebRtcVideoEngine::CreateMediaChannel , adds send streams ( WebRtcVideoSendStream ), which instantiate VideoSendStream and its VideoStreamEncoder . The encoder is then registered as a sink through VideoSourceProxy::SetSource and WebRtcVideoSendStream::AddOrUpdateSink , completing the flow from capture to encoder.

4. Summary – From the API perspective, the four steps CreateVideoTrack() , AddTrack() , CreateOffer() , and SetLocalDescription() establish the full video pipeline on the sending side. Although many classes are involved, the actual frame path is short: source → sink interfaces propagate frames from the capture module to the encoder, adhering to the source‑to‑sink data flow model of WebRTC.

CpipelineReal-time CommunicationWebRTCvideo capture
TAL Education Technology
Written by

TAL Education Technology

TAL Education is a technology-driven education company committed to the mission of 'making education better through love and technology'. The TAL technology team has always been dedicated to educational technology research and innovation. This is the external platform of the TAL technology team, sharing weekly curated technical articles and recruitment information.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.