Developing Real‑Time Interactive AI Video Applications with WebRTC and JSMpeg
This article explains how to build a browser‑based AI video interaction system by comparing two approaches—streaming RTSP video via a JSMpeg‑powered WebSocket relay and capturing local media directly with WebRTC’s getUserMedia API—along with code samples, constraints handling, and frame‑extraction techniques.
The article introduces a project that visualizes face and gesture recognition on a large screen. The front‑end must capture a live camera stream, display it on a canvas, extract a frame every 1000 ms, compress it, and send it to a back‑end AI model via WebSocket.
Solution 1 – WebSocket + JSMpeg : An IPC camera provides an RTSP stream. Because native video tags cannot decode RTSP, the stream is fed to a lightweight pure‑JS decoder JSMpeg, which converts the HEVC‑encoded RTSP stream to MPEG‑1 and broadcasts MPEG‑TS packets over a WebSocket server. The browser receives the TS fragments, renders them on a canvas, and can handle 720p video at 30 fps.
var streamServer = http.createServer(function(request, response) {
request.on('data', function(data){
socketServer.broadcast(data);
});
request.on('end', function(){
console.log('close');
});
}).listen(STREAM_PORT);FFmpeg (or GStreamer) can push the RTSP stream to the HTTP endpoint, for example:
ffmpeg -f v4l2 -framerate 25 -video_size 640x480 -i /dev/video0 \
-f mpegts -codec:v mpeg1video -s 640x480 -b:v 1000k -bf 0 \
http://127.0.0.1:8081While functional, this method suffers from latency (~1000 ms), low video quality (MPEG‑1), and high CPU usage on the client.
Solution 2 – WebRTC getUserMedia : By using a USB or built‑in camera, the front‑end can directly obtain a MediaStream via navigator.mediaDevices.getUserMedia . The stream is drawn onto the canvas, and only the frame‑extraction logic remains, eliminating the need for network transport and decoding.
if (!navigator.mediaDevices || !navigator.mediaDevices.enumerateDevices) {
console.log("不支持 enumerateDevices().");
return;
}
navigator.mediaDevices.enumerateDevices()
.then(function(devices){
devices.forEach(function(device){
console.log(device.kind + ": " + device.label + " id = " + device.deviceId);
});
})
.catch(function(err){
console.log(err.name + ": " + err.message);
});Typical constraints for video capture:
{
audio: false,
video: true
}To request a specific resolution or camera, the constraints object can use width , height , facingMode , or deviceId fields, e.g.:
{ video: { width: { ideal: 1280 }, height: { ideal: 720 }, facingMode: "user" } }Errors such as NotFoundError , OverconstrainedError , or NotAllowedError must be handled when the requested device or constraints cannot be satisfied.
var promise = navigator.mediaDevices.getUserMedia({ video: true, audio: false });
promise.then(function(stream){
video.srcObject = stream;
}).catch(function(err){
if (err.name == 'NotFoundError' || err.name == 'DeviceNotFoundError') {
console.log(err.name, 'require track is missing');
} else if (err.name == 'NotAllowedError' || err.name == 'PermissionDeniedError') {
console.error(err.name, 'permission denied in browser');
} else {
console.error(err.name, 'other errors');
}
});Screen capture can be performed with navigator.mediaDevices.getDisplayMedia , which always includes a video track and does not accept detailed media‑track constraints.
navigator.getDisplayMedia({ video: true })
.then(stream => { videoElement.srcObject = stream; })
.catch(error => { console.log("Unable to acquire screen capture", error); });After obtaining a video stream, frame extraction is achieved by drawing the current frame onto a canvas and sending the image (as Base64 or Blob) over the WebSocket at a fixed interval:
function drawImage(){
context.drawImage(video, 0, 0, width, height);
let base64Image = canvas.toDataURL('image/jpeg', 1);
window.rws.send(JSON.stringify({ image: base64Image }));
}
window.drawInter = setInterval(drawImage, 1000);Finally, the article lists the advantages of WebRTC (browser‑based real‑time communication, open‑source, low cost, cross‑platform) and its drawbacks (browser compatibility, unstable transmission under poor network conditions, and mobile adaptation challenges).
References to official specifications and documentation are provided at the end.
360 Tech Engineering
Official tech channel of 360, building the most professional technology aggregation platform for the brand.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.