Frontend Development 22 min read

Implementing WebRTC Media Capture, Recording, and Real‑Time Speech Recognition in Web Applications

This article provides a comprehensive guide on using WebRTC, getUserMedia, and MediaRecorder to capture camera and microphone streams, perform screen capture, visualize audio, handle device and network checks, convert media formats, and integrate real‑time speech‑to‑text services, while sharing practical pitfalls and solutions.

Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Implementing WebRTC Media Capture, Recording, and Real‑Time Speech Recognition in Web Applications

The article begins with a brief overview of WebRTC, describing it as a peer‑to‑peer communication technology that enables real‑time audio and video transmission directly between browsers.

Getting User Devices

It introduces the getUserMedia API, shows compatibility tables, and provides a sample implementation that checks for support, configures media constraints, and obtains a MediaStream object.

const isSupportMediaDevicesMedia = () => {
  return !!(navigator.getUserMedia || (navigator.mediaDevices && navigator.mediaDevices.getUserMedia));
};

if (isSupportMediaDevicesMedia()) {
  navigator.getUserMedia = navigator.getUserMedia || navigator.webkitGetUserMedia || navigator.mozGetUserMedia || navigator.msGetUserMedia;
  const mediaOption = { audio: true, video: true };
  navigator.mediaDevices.getUserMedia(mediaOption)
    .then(stream => console.log('[Log] stream-->', stream))
    .catch(err => console.error('[Log] 获取摄像头和麦克风权限失败-->', err));
} else {
  // fallback handling
}

A more detailed mediaOption example demonstrates how to specify resolution, frame rate, device ID, and facing mode.

const mediaOption = {
  audio: true,
  video: {
    width: { min: 980, ideal: 980, max: 1920 },
    height: { min: 560, ideal: 560, max: 1080 },
    frameRate: { ideal: 12, max: 15 },
    deviceId: { exact: '设备id' },
    facingMode: 'user'
  }
};

Simple Playback Case

Using a React component, the captured stream is assigned to a video element via its srcObject property, with proper cleanup of tracks on unmount.

import React, { useEffect, useRef } from 'react';
function RecordInfo() {
  const streamRef = useRef();
  const videoRef = useRef();
  useEffect(() => {
    const constraints = { video: { width: { min: 980, ideal: 1920, max: 1920 }, height: { min: 560, ideal: 1080, max: 1080 }, frameRate: { ideal: 12, max: 15 } }, audio: true };
    navigator.mediaDevices.getUserMedia(constraints)
      .then(stream => {
        streamRef.current && streamRef.current.getTracks().forEach(t => t.stop());
        streamRef.current = stream;
        videoRef.current.srcObject = stream;
      })
      .catch(err => console.error('[Log] 用户拒绝使用摄像头和麦克风', err));
    return () => {
      streamRef.current && streamRef.current.getTracks().forEach(t => t.stop());
    };
  }, []);
  return (
);
}
export default RecordInfo;

Selecting Specific Devices

The enumerateDevices API returns a list of MediaDeviceInfo objects (kind, label, deviceId), allowing developers to pick a particular camera or microphone by ID.

Screen Capture

Screen recording uses getDisplayMedia , which works similarly to getUserMedia but captures the display instead of physical devices.

Device and Network Detection

Network speed is measured by loading a series of images with random query strings to avoid caching, while hardware detection simply attempts to call getUserMedia and checks for failures.

MediaRecorder for Video/Audio Recording

The article explains creating a MediaRecorder with optional mimeType , handling dataavailable events, starting/stopping recording, and playing back collected blobs via URL.createObjectURL .

const recorder = new MediaRecorder(stream, { mimeType: 'video/webm' });
let chunks = [];
recorder.ondataavailable = e => chunks.push(e.data);
recorder.start(); // optional timeslice for periodic chunks
// ... later
recorder.stop();
const blob = new Blob(chunks, { type: 'video/webm' });
video.src = URL.createObjectURL(blob);

It also discusses supported mime types using a helper function that iterates over common codecs.

Video Screenshot

Capturing a frame involves drawing the video onto a canvas and converting it to a Blob.

export function takeScreenshot(video) {
  return new Promise((resolve, reject) => {
    const canvas = document.createElement('canvas');
    const ctx = canvas.getContext('2d');
    canvas.width = video.videoWidth;
    canvas.height = video.videoHeight;
    ctx.drawImage(video, 0, 0, canvas.width, canvas.height);
    canvas.toBlob(blob => resolve(blob), 'image/jpeg', 1);
  });
}

Audio Visualization (Spectrum)

Using the Web Audio API, an AnalyserNode extracts frequency data, which can be summed to compute a simple volume level.

const audioContext = new AudioContext();
const source = audioContext.createMediaElementSource(audio);
const analyser = audioContext.createAnalyser();
source.connect(analyser);
analyser.connect(audioContext.destination);
analyser.fftSize = 256;
function analyzeAudio() {
  const bufferLength = analyser.frequencyBinCount;
  const dataArray = new Uint8Array(bufferLength);
  analyser.getByteFrequencyData(dataArray);
  const sum = dataArray.reduce((p, c) => p + c, 0);
  const scale = Math.min(100, Math.floor((sum * 100) / bufferLength / 128));
  console.log('[Log] 声音大小->', scale);
  requestAnimationFrame(analyzeAudio);
}
audioContext.resume().then(analyzeAudio);

Real‑Time Speech‑to‑Text

The guide references Tencent Cloud’s JavaScript SDK, showing three steps: creating a recorder (webrecorder.js), creating a recognizer (speechrecognizer.js), and wiring them together (webaudiospeechrecognizer.js) to stream PCM data to the cloud service.

Common Pitfalls and Fixes

Recorded WebM files lack duration and seek cues, making playback controls incomplete. The article suggests using libraries such as fix-webm-duration or ts‑ebml to rewrite the file header and make the blob seekable.

Conversion from WebM to MP3 is handled with recorder-core (which internally uses lamejs ) to produce smaller, widely‑supported audio files.

For video format conversion to MP4, the author mentions WebAV (WebCodecs) for future work, noting its limited browser support, and suggests server‑side ffmpeg or ffmpeg.wasm as alternatives.

References

Links to articles on Web Audio visualization, Electron/Chromium screen recording, and other related resources are provided for further reading.

FrontendWebRTCMediaRecordergetUserMediaAudioVisualizationVideoCapture
Rare Earth Juejin Tech Community
Written by

Rare Earth Juejin Tech Community

Juejin, a tech community that helps developers grow.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.