Mastering WebRTC: From RTMP/HLS Basics to Real-Time Audio‑Video Communication
This article explains common audio‑video streaming protocols such as RTMP and HLS, compares their use cases, then dives into WebRTC fundamentals, device detection, media capture, recording, connection setup, codec considerations, and how to display remote streams, providing a comprehensive guide for building real‑time web communication applications.
Common Audio‑Video Network Communication Protocols
Standard Live Streaming Protocols
Ordinary Live Protocols
These streams prioritize picture quality over low latency, use CDN distribution, and typically employ RTMP and HLS.
Basic Concepts
RTMP (Real Time Messaging Protocol) – TCP‑based, widely supported by CDNs, low implementation difficulty, but not supported by browsers or iOS; Adobe has stopped updating it.
HLS (HTTP Live Streaming) – Apple‑defined HTTP‑based protocol, splits stream into TS segments, introduces at least one‑segment latency; excellent mobile compatibility (iOS, Android) and can be used in browsers via hls.js.
Choosing Between RTMP and HLS
Use RTMP for media pushing.
Use HLS for mobile web players because browsers do not support RTMP.
iOS requires HLS.
On‑demand video prefers HLS.
Typical Architecture of Ordinary Live
Consists of a live client, a signaling server, and a CDN network.
The live client handles capture, encoding, pushing, pulling, decoding and playback; the broadcaster pushes encoded streams to the CDN, while viewers pull streams from the CDN and render them.
The signaling server manages room creation, joining, leaving and text chat.
The CDN distributes media data to users.
Real‑Time Live Protocols
WebRTC was created to meet the growing demand for low‑latency, interactive communication.
WebRTC Overview
WebRTC (Web Real‑Time Communication) is an open protocol supported by major browsers, enabling peer‑to‑peer audio‑video communication without plugins.
It abstracts complex media handling (codec, transport, echo cancellation, etc.) behind a simple API.
WebRTC Audio‑Video Communication Process
Audio‑Video Device Detection
Fundamentals of Audio Devices
Audio input devices perform A/D conversion, quantization and encoding to produce digital signals.
Fundamentals of Video Devices
Video devices use optical sensors to convert light to RGB data, then DSP processing, conversion to YUV, and compression for transmission.
Getting Device List
navigator.mediaDevices.enumerateDevices()returns available input and output devices.
navigator.mediaDevices.enumerateDevices().then(function(deviceInfos) {
deviceInfos.forEach(function(deviceInfo) {
console.log(deviceInfo);
});
});Device labels are empty unless the user grants media permission over HTTPS.
Device Detection Methods
Inspect deviceInfo.kind to distinguish audio vs video devices.
Default devices are selected automatically; specifying a device ID overrides the default.
Use getUserMedia to test whether a device can provide a usable stream.
Video detection: call getUserMedia for video and display it; if visible, the device works.
Audio detection: capture audio with getUserMedia and visualize the waveform or level changes.
Audio‑Video Capture
Key Concepts
Frame rate – number of frames per second; typical acceptable range is 10‑30 fps, with 60 fps for smoother interaction.
Track – independent media stream component (audio track, video track) that does not intersect with other tracks.
Capture API
mediaDevices.getUserMedia const mediaStreamContrains = {
video: true,
audio: true
};
navigator.mediaDevices.getUserMedia(mediaStreamContrains)
.then(gotLocalMediaStream);The srcObject property of a media element can be set to a MediaStream object.
Taking a Snapshot
Use a canvas drawImage call with the video element, then download the data URL.
const ctx = document.querySelector('canvas');
ctx.getContext('2d').drawImage($video, 0, 0);
function downLoad(url){
const a = document.createElement("a");
a.download = 'photo';
a.href = url;
document.body.appendChild(a);
a.click();
a.remove();
}
downLoad(ctx.toDataURL("image/jpeg"));Audio‑Video Recording
Key Concepts
ArrayBuffer – fixed‑length binary data buffer.
ArrayBufferView – typed array views such as Uint32Array.
Blob – binary large object used to store recorded media.
Recording API
new MediaRecorder(stream[, options])Use MediaRecorder.ondataavailable to collect Blob chunks, then create an object URL for playback or download.
const mediaRecorder = new MediaRecorder(stream);
mediaRecorder.ondataavailable = handleDataAvailable;
mediaRecorder.start(2000);Establishing a Connection
After capturing media, create an RTCPeerConnection on each side, exchange SDP offers/answers via a signaling server, and exchange ICE candidates.
RTCPeerConnection Workflow
Obtain local media stream with getUserMedia and add it to the peer connection.
Create an SDP offer (A), set local description, send via signaling.
Remote peer (B) sets remote description, creates an answer, sets local description, and sends back.
Both peers exchange ICE candidates and add them with addIceCandidate.
// A creates offer
localPeerConnection.createOffer().then(description => {
return localPeerConnection.setLocalDescription(description);
});
// B receives offer, sets remote, creates answer
remotePeerConnection.setRemoteDescription(offer)
.then(() => remotePeerConnection.createAnswer())
.then(answer => remotePeerConnection.setLocalDescription(answer));Audio‑Video Codec Overview
Video is a sequence of frames; codecs such as H.26x, MPEG, and QuickTime reduce spatial and temporal redundancy to meet bandwidth constraints.
Displaying Remote Media
When a remote stream arrives, the onaddstream handler assigns it to a video element’s srcObject.
localPeerConnection.onaddstream = function(event) {
$remoteVideo.srcObject = event.stream;
};Conclusion
WebRTC encompasses many components; this article provides a high‑level overview of the typical workflow for real‑time audio‑video communication.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
