WebRTC MediaStream and RTCPeerConnection API Overview and Usage
This article provides a comprehensive overview of WebRTC's MediaStream and RTCPeerConnection APIs, explaining core concepts such as tracks, sources, sinks, device enumeration, media constraints, bitrate and resolution settings, compatibility considerations, screen sharing, content hints, and the step‑by‑step workflow for establishing a peer‑to‑peer connection in the browser.
Preface
WebRTC is a key technology for real‑time communication, enabling low‑latency, high‑quality audio, video, and data transmission for scenarios such as peer‑to‑peer calls, multi‑party conferences, and screen sharing, and it is widely applicable in remote collaboration, online education, telemedicine, and IoT.
MediaStream API
The MediaStream API consists of two main objects: MediaStreamTrack and MediaStream . A MediaStreamTrack represents a single type of media (audio or video) generated from a source, while a MediaStream aggregates multiple tracks, allowing synchronized playback of audio and video.
Each track is composed of a source that provides data and a sink that consumes it.
A MediaStream can contain zero or more tracks; all tracks within a stream are rendered in sync, similar to playing a multimedia file.
Source and Sink
In the source code, a MediaTrack is built from a corresponding source and sink. In the browser, the source produces media resources (e.g., microphone audio, camera video, static files), while the sink consumes them for rendering or for transmission via RTCPeerConnection . The connection can act as both source and sink, performing operations such as bitrate reduction, scaling, or frame‑rate adjustment before forwarding the stream.
Detecting Audio/Video Devices
The MediaDevices interface provides access to input devices like cameras and microphones. Its enumerateDevices() method returns a Promise that resolves to an array of MediaDeviceInfo objects describing each available device.
Capturing Local Audio/Video
MediaDevices.getUserMedia() prompts the user for permission and, upon approval, returns a Promise that resolves to a MediaStream containing the requested audio and/or video tracks. If permission is denied or the device is unavailable, the promise is rejected with an error.
Video Constraints
The getUserMedia() call accepts a MediaStreamConstraints object to specify required tracks and optional constraints such as video and audio flags, as well as resolution preferences using width and height . The browser treats these values as "ideal" and may fall back to the nearest supported resolution.
Constraints can also use the keywords min , max , and exact to bound the resolution range, and the deviceId property to select a specific capture device.
MediaDevices.getSupportedConstraints() returns a dictionary of all constraints supported by the user agent.
Bitrate Settings
During a call, the video bitrate can be adjusted in real time by modifying RTCRtpEncodingParameters.maxBitrate , which accepts values in bits per second.
Compatibility Issues
Older browsers may not support navigator.mediaDevices.getUserMedia . In such cases, vendor‑specific fallbacks are required. Safari also has limitations: it does not support multiple tabs using getUserMedia simultaneously, and its screen‑sharing API only allows sharing the entire screen.
Screen Sharing
The MediaDevices.getDisplayMedia() method prompts the user to select a screen, window, or tab to capture, returning a MediaStream similar to getUserMedia . Safari currently only permits full‑screen sharing.
Device Change Monitoring
WebRTC exposes a devicechange event (and the corresponding ondevicechange handler) on navigator.mediaDevices to detect hot‑plugging of media devices.
Video Quality Adaptation Strategies
WebRTC provides three degradation preferences:
MAINTAIN_FRAMERATE : keep frame rate, reduce resolution (suitable for video calls).
MAINTAIN_RESOLUTION : keep resolution, reduce frame rate (ideal for screen sharing or document view).
BALANCED : a balanced approach, disabled by default and enabled via the WebRTC-Video-BalancedDegradation flag.
These preferences are set via the contentHint property of a MediaStreamTrack , which the WebRTC stack translates into a DegradationPreference .
Audio Content Hints
"speech": optimize for spoken voice, possibly applying noise suppression.
"speech-recognition": prioritize clarity for transcription, disabling consumer‑oriented processing.
"music": preserve musical fidelity by disabling voice‑centric processing.
Video Content Hints
"motion": lower resolution to maintain frame rate during fast motion.
"detail": lower frame rate to preserve resolution, default for screen sharing.
"text": lower frame rate while preserving resolution for text‑heavy content; enables text‑optimized encoding tools when supported (e.g., AV1).
RTCPeerConnection API
The RTCPeerConnection interface represents a WebRTC connection between a local and a remote peer, providing methods to create, maintain, monitor, and close the connection.
Its optional RTCConfiguration object includes properties such as:
iceServers : an array of RTCIceServer objects containing STUN/TURN server URLs.
iceTransportPolicy : controls which ICE candidates are considered ("all", "public" – deprecated, "relay").
rtcpMuxPolicy : determines RTCP multiplexing strategy ("negotiate" or "require").
bundlePolicy : selects how media tracks share transport channels ("balanced", "max-compat", "max-bundle").
Typical usage steps include creating a new RTCPeerConnection , adding tracks with addTrack , exchanging SDP offers/answers, and handling ICE candidates.
Main Methods
addIceCandidate : provide remote ICE candidates.
addStream / removeStream : add or remove a MediaStream as a source.
addTrack / removeTrack : add or remove individual MediaStreamTrack objects.
createOffer / createAnswer : generate SDP for negotiation.
setLocalDescription / setRemoteDescription : set local or remote SDP.
getStats : retrieve connection statistics.
close : terminate the connection.
Main Events
onaddstream : fired when a remote stream is added.
ontrack : fired when a remote track is received.
ondatachannel : fired when a data channel is created.
onicecandidate : fired when a new ICE candidate is gathered.
oniceconnectionstatechange , onsignalingstatechange , onremovestream , etc.: provide state‑change notifications.
Typical Workflow
Create a new RTCPeerConnection and add local audio/video tracks via addTrack . The remote side listens for ontrack to receive and render the media.
Exchange SDP offers and answers so that each peer learns the other's capabilities and negotiates codecs, parameters, and transport details.
Exchange ICE candidates. When the initiator receives a remote candidate, it calls addIceCandidate . Likewise, the responder adds the initiator's candidates:
pc2.addIceCandidate(new RTCIceCandidate(ice))
After these steps, a basic WebRTC connection is established.
360 Smart Cloud
Official service account of 360 Smart Cloud, dedicated to building a high-quality, secure, highly available, convenient, and stable one‑stop cloud service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.