How to Implement Dynamic Video Frame Preview in the Browser with WebAssembly and FFmpeg
This article explains a front‑end solution for generating on‑hover video frame previews by extracting frames from HLS streams using WebAssembly‑compiled FFmpeg, covering playlist parsing, TS decryption, WASM integration, canvas rendering, caching strategies, and performance considerations.
Browser Methods for Getting Video Frames
Current browsers support two main approaches for extracting video frames:
Canvas + Video : The video element is drawn onto a canvas using
drawImage. This works only with formats the browser can decode directly (MP4/WebM with H.264/VP8). For other formats, demuxing and Media Source Extensions (MSE) are required.
WebAssembly + FFmpeg : WebAssembly enables front‑end decoding of video data. FFmpeg is compiled to a WASM module, called from JavaScript to extract frame data, which is then drawn on a canvas.
HLS Dynamic Decryption, TS Fragment Loading, and Frame Extraction
The overall technical flow is:
Parse the HLS master playlist and level playlists to obtain an array of low‑resolution TS segment URLs.
Detect HLS encryption, fetch the decryption key, and AES‑decrypt TS files.
Load the TS file as an
ArrayBuffer, allocate memory in the WASM module, and write the buffer into WASM memory.
Call the exported
_getFramefunction with the memory pointer, size, and target timestamp to obtain RGB frame data.
Convert the RGB data to a canvas
ImageDataobject and display or cache the frame.
FFmpeg Compilation to WebAssembly
Prerequisites
Install the Emscripten SDK (emsdk) following the official guide. Verify installation with
emcc -v.
<code># Get the emsdk repo
git clone https://github.com/emscripten-core/emsdk.git
# Enter that directory
cd emsdk
# Download and install the latest SDK tools.
./emsdk install latest
# Make the "latest" SDK "active" for the current user.
./emsdk activate latest
# Activate PATH and other environment variables in the current terminal
source ./emsdk_env.sh
</code>FFmpeg Build Configuration
FFmpeg provides extensive audio‑video processing capabilities. For this use‑case we disable most features and enable only the required components (H.264 decoder, MPEG‑TS demuxer, file protocol, etc.).
<code>emconfigure ./configure \
--prefix=$WEB_CAPTURE_PATH/lib/ffmpeg-emcc \
--cc="emcc" \
--cxx="em++" \
--ar="emar" \
--cpu=generic \
--target-os=none \
--arch=x86_32 \
--enable-gpl \
--enable-version3 \
--enable-cross-compile \
--disable-ffmpeg \
--disable-ffplay \
--disable-ffprobe \
--disable-doc \
--disable-ffserver \
--disable-swresample \
--disable-postproc \
--disable-programs \
--disable-avfilter \
--disable-pthreads \
--disable-w32threads \
--disable-os2threads \
--disable-network \
--disable-logging \
--disable-everything \
--enable-protocol=file \
--enable-demuxer=mpegts \
--enable-decoder=h264 \
--disable-asm \
--disable-debug
</code>Analyzing FFmpeg Frame Extraction Flow
Extracting frames involves the following FFmpeg libraries:
libavcodec: Provides codec support (H.264 in this case).
libavformat: Handles demuxing of MPEG‑TS streams.
libswscale: Performs pixel format conversion (YUV → RGB).
libavutil: Utility functions.
Compiling to WASM
Finally, compile the required objects into a WASM module:
<code>emcc ./getframe.c ./ffmpeg/lib/libavformat.a ./ffmpeg/lib/libavcodec.a ./ffmpeg/lib/libswscale.a ./ffmpeg/lib/libavutil.a \
-O3 \
-I "./ffmpeg/include" \
-s WASM=1 \
-s TOTAL_MEMORY=33554432 \
-s EXPORTED_FUNCTIONS='["_main", "_free", "_getFrame", "_setFile"]' \
-s ASSERTIONS=1 \
-s ALLOW_MEMORY_GROWTH=1 \
-s MAXIMUM_MEMORY=4GB \
-o getframe.js
</code>EXPORTED_FUNCTIONS tells the compiler which functions should be exposed to JavaScript; each name must be prefixed with an underscore.
ASSERTIONS=1 enables runtime checks for memory allocation errors; set to 2 for additional testing.
ALLOW_MEMORY_GROWTH=1 allows the Emscripten heap to expand at runtime, avoiding the need to over‑allocate TOTAL_MEMORY.
Wrapping the API
Register all available formats and codecs once (e.g., in
main()) with:
<code>#include <libavcodec/avcodec.h>
#include <libavformat/avformat.h>
#include <libavutil/imgutils.h>
#include <libswscale/swscale.h>
int main(int argc, char const *argv[])
{
av_register_all();
return 0;
}
</code>Typical usage steps:
Open the video file (or TS segment) with
avformat_open_inputand retrieve stream info.
Find the video decoder, allocate a codec context, and open the codec.
Read packets, decode video frames, convert them to RGB with
sws_scale, and save or return the frame data.
<code>AVFormatContext *pFormatCtx = NULL;
if(avformat_open_input(&pFormatCtx, argv[1], NULL, 0) != 0) return -1;
if(avformat_find_stream_info(pFormatCtx, NULL) < 0) return -1;
AVCodec *pCodec = avcodec_find_decoder(pCodecCtx->codec_id);
if(!pCodec) { fprintf(stderr, "Unsupported codec!\n"); return -1; }
AVCodecContext *pCodecCtx = avcodec_alloc_context3(pCodec);
if(avcodec_copy_context(pCodecCtx, pCodecCtxOrig) != 0) { fprintf(stderr, "Couldn't copy codec context"); return -1; }
if(avcodec_open2(pCodecCtx, pCodec) < 0) return -1;
struct SwsContext *sws_ctx = NULL;
sws_ctx = sws_getContext(pCodecCtx->width, pCodecCtx->height, pCodecCtx->pix_fmt,
pCodecCtx->width, pCodecCtx->height, PIX_FMT_RGB24,
SWS_BILINEAR, NULL, NULL, NULL);
int i = 0;
while(av_read_frame(pFormatCtx, &packet) >= 0) {
if(packet.stream_index == videoStream) {
avcodec_decode_video2(pCodecCtx, pFrame, &frameFinished, &packet);
if(frameFinished) {
sws_scale(sws_ctx, (uint8_t const * const *)pFrame->data,
pFrame->linesize, 0, pCodecCtx->height,
pFrameRGB->data, pFrameRGB->linesize);
if(++i <= 5) SaveFrame(pFrameRGB, pCodecCtx->width, pCodecCtx->height, i);
}
}
av_free_packet(&packet);
}
</code>JavaScript Calling the WASM Module
Typical workflow:
Fetch the TS segment via XHR/fetch and store it as a
Uint8Array.
Allocate memory in the WASM heap with
Module._mallocand copy the buffer.
Invoke the exported
_getFramefunction, passing the pointer, size, and target timestamp.
Convert the returned RGB buffer to a canvas
ImageDataobject and display or cache it.
<code>let tsBuffer = new Uint8Array(tsFileArrayBuffer);
let tsBufferPtr = Module._malloc(tsBuffer.length);
Module.HEAP8.set(tsBuffer, tsBufferPtr);
let imgData = Module._getFrame(tsBufferPtr, tsBuffer.length, time);
const canvas = document.createElement('canvas');
const ctx = canvas.getContext('2d');
canvas.width = width;
canvas.height = height;
let imageData = ctx.createImageData(width, height);
let j = 0;
for(let i = 0; i < imgData.length; i++) {
if(i && i % 3 == 0) { imageData.data[j++] = 255; }
imageData.data[j++] = imgData[i];
}
ctx.putImageData(imageData, 0, 0);
const finalData = canvas.toDataURL('image/jpeg');
</code>HLS Dynamic Loading and Decryption
Parsing Master/Level Playlists
HLS VOD is an M3U8 index pointing to multiple bitrate levels. We parse the master playlist, select the lowest‑resolution level, then parse its segment list to map timestamps to TS files.
<code>#EXTM3U
#EXT-X-VERSION:3
#EXT-X-STREAM-INF:PROGRAM-ID=0,BANDWIDTH=2099325,RESOLUTION=1920x1080
v.f124099.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=0,BANDWIDTH=197642,RESOLUTION=1280x720
v.f22239.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=0,BANDWIDTH=142162,RESOLUTION=960x540
v.f22240.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=0,BANDWIDTH=95767,RESOLUTION=480x270
v.f22241.m3u8
</code>Mapping Segments to Time Ranges
Using
#EXTINFwe compute each segment’s duration, accumulate total time, and determine start/end times for each segment. When the user hovers at a specific timestamp, we locate the nearest segment.
<code>#EXTINF:10.000000,
v.f22241.ts?start=260400&end=382047&type=mpegts
#EXT-X-KEY:METHOD=AES-128,URI="http://getkeyurl",IV=0x00000000000000000000000000000000
</code>AES Decryption of TS Files
If the playlist contains
#EXT-X-KEY:METHOD=AES-128, fetch the key from the URI, use the IV from the playlist, and decrypt the TS fragment with WebCrypto (or a helper library such as
videojs/aes-decrypter).
<code>let decrypter = new Decrypter();
const { key, iv } = levelKey;
decrypter.decrypt(data, key, iv, (data) => {
const finalSegmentFile = new Blob([data], { type: 'video/mp2t' });
// Pass finalSegmentFile to WASM for decoding
});
</code>Progress Frame Preview Logic and Caching Strategy
Because mouse movements on the progress bar can be frequent, we throttle the
mousemoveevent (e.g., 500 ms). For each hover event we:
Parse the playlist, load and decrypt the relevant TS segment.
Pass the segment to the WASM decoder to obtain a frame.
Cache the frame image data keyed by its time interval.
When the user hovers near a previously cached interval, we display the cached frame immediately, reducing latency.
Issues and Summary
Using WASM for front‑end frame preview is now in gray‑release. The compiled WASM binary is about 2.6 MB. Network latency dominates: from hover to preview image takes roughly 1.1 seconds, while the actual decoding takes under 60 ms. The solution works on Chrome, Firefox, and Safari on desktop.
For a 600 MB HD video, loading the entire file for preview would consume 30‑50 MB of traffic, but in practice only a few megabytes are needed because we load just the low‑resolution segments required for the preview.
WASM has moved from demos to production use cases. Features once only possible in native apps are now feasible in browsers, opening many interesting possibilities for future development.
Tencent IMWeb Frontend Team
IMWeb Frontend Community gathering frontend development enthusiasts. Follow us for refined live courses by top experts, cutting‑edge technical posts, and to sharpen your frontend skills.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.