Implementing AAC Audio Encoding in Web Browsers Using FFmpeg and WebAssembly
This article explains how to implement AAC audio encoding directly in the browser by compiling FFmpeg’s libavcodec and libswresample to WebAssembly with Emscripten, detailing the required FFmpeg modules, AAC encoding workflow, compilation options, and JavaScript integration for real‑time media streaming.
1. Introduction
Current PC web live‑streaming implementations use WebRTC with H264/VP8 video and Opus audio, while downstream protocols (Flv, Hls) require AAC audio, forcing media servers to transcode Opus to AAC. To avoid this overhead, the article proposes implementing AAC push‑streaming directly on the web side.
Streaming consists of three parts: capture, encoding, and uplink transmission. Capture uses browser camera/mic APIs, video encoding uses WebCodec, and transport uses WebSocket/WebTransport. Since WebCodec does not yet support AAC encoding, an alternative solution is needed for audio.
Audio/video encoding is CPU‑intensive and is often implemented in C/C++. FFmpeg provides a powerful API for multimedia processing, including AAC encoding, and can be compiled to WebAssembly using mature toolchains.
2. FFmpeg Core Modules
FFmpeg is an open‑source multimedia framework used by browsers, mobile players, and streaming servers. Its modules are clearly separated:
libavformat – container format and I/O handling.
libavcodec – decoding and encoding.
libswscale – video scaling and pixel format conversion.
libswresample – audio resampling and format conversion.
libavfilter – audio/video filter processing.
In the presented scenario, only libavcodec (for AAC encoding) and libswresample (for resampling) are required.
3. AAC Encoding Workflow
The audio specifications focus on sample rate, channel count, and bitrate. The implementation assumes 48 kHz mono PCM input and outputs LC‑AAC frames (1024 samples per frame). The encoding pipeline uses the following FFmpeg structures:
SwrContext – resamples PCM to the target sample rate/channel.
AVAudioFifo – buffers audio samples until a full frame is ready.
AVPacket – stores encoded AAC data.
AVFrame – holds PCM data before encoding.
AVCodecContext – drives the encoding process via avcodec_send_frame and avcodec_receive_packet .
When resampling from 48 kHz to 44.1 kHz, 1024 input samples become 940 output samples, requiring buffering until enough samples accumulate for a full AAC frame.
4. Compiling with Emscripten
FFmpeg is written in C, and Emscripten is the standard toolchain for converting C/C++ projects to WebAssembly. The compilation produces a WebAssembly binary and a JavaScript “glue” file that loads the module.
4.1 Building the Static Libraries
The FFmpeg source is configured to include only the AAC encoder and the necessary protocols, disabling all other components to keep the binary small:
emconfigure ./configure --prefix=$(pwd)/libsoutputsdir \
--cc="emcc" --cxx="em++" --ar="emar" --ranlib="emranlib" --cpu=generic --target-os=none \
--enable-small \
--extra-cflags=-Os \
--enable-cross-compile \
--disable-inline-asm \
--disable-x86asm \
--disable-ffmpeg \
--disable-ffplay \
--disable-ffprobe \
--disable-programs \
--disable-doc \
--disable-htmlpages \
--disable-manpages \
--disable-podpages \
--disable-txtpages \
--disable-swscale \
--disable-devices \
--disable-avdevice \
--disable-avformat \
--disable-avfilter \
--disable-logging \
--disable-videotoolbox \
--disable-postproc \
--disable-pthreads \
--disable-os2threads \
--disable-w32threads \
--disable-network \
--disable-debug \
--disable-everything \
--enable-protocol=data \
--enable-encoder=aac \Running make produces the static libraries libavcodec , libswresample , and libavutil .
4.2 Compiling the AAC Encoder Program
The C encoder program is then compiled to WebAssembly with the following command:
emcc pcm2aac.c -Os -lavcodec -lavutil -lswresample \
-L../fflibs/lib -I../fflibs/include \
-Wno-implicit-function-declaration \
-s TOTAL_MEMORY=33554432 \
-s MODULARIZE=1 \
-s EXPORT_NAME=m \
-s EXPORTED_FUNCTIONS='["_init_callback", "_encode_one_frame", "_init_encoder", "_free_encoder", "_flush", "_malloc"]' \
-s EXPORTED_RUNTIME_METHODS='["addFunction"]' \
-s RESERVED_FUNCTION_POINTERS=20 \
-o pcm2aac.jsThe output consists of pcm2aac.js (glue) and pcm2aac.wasm .
5. Using the WebAssembly Module from JavaScript
Loading and initializing the module typically involves fetching the glue script, executing it, and then calling the exported factory function:
const text = await fetch(`path/to/pcm2aac.js`).then(res => res.text());
new Function(`self.exports={};${text}`)();
const WasmModule = self.exports.m;When instantiated, the module exposes functions such as _init_callback and _init_encoder . A JavaScript callback can be registered to receive the encoded AAC data:
function aacOutput(ptr, size) {
const buf = this._module.HEAPU8.subarray(ptr, ptr + size);
// process or download the AAC buffer
}
const fnPtr = module.addFunction(aacOutput, 'vii');
module._init_callback(fnPtr);
this._module = module;In the WebAssembly code the callback is stored as:
typedef void (*OutputCallback)(uint8_t *buff, int size);
OutputCallback callback = NULL;
void init_callback(long fn) {
callback = (void (*)(unsigned char *, int))fn;
}Encoding a PCM buffer then looks like:
function encode(pcmBuffer) {
const ptr = this._module._malloc(pcmBuffer.length);
this._module.HEAPU8.subarray(ptr, ptr + pcmBuffer.length).set(pcmBuffer);
const ret = this._module._encode_one_frame(ptr);
if (ret < 0) {
// handle error
}
}6. Conclusion
WebAssembly provides a portable byte‑code format that, together with Emscripten, enables the reuse of mature C libraries such as FFmpeg directly in the browser. By compiling only the needed AAC encoder and resampler, developers can perform real‑time audio encoding without server‑side transcoding, achieving lower latency and reduced infrastructure costs.
7. References
[1] FFmpeg Documentation: http://ffmpeg.org/doxygen/trunk/index.html [2] Emscripten: https://emscripten.org/docs/tools_reference/emcc.html?highlight=modularize [3] FFmpeg source: https://github.com/FFmpeg/FFmpeg/blob/master/configure [4] SwrContext: http://ffmpeg.org/doxygen/trunk/structSwrContext.html [5] AVAudioFifo: http://ffmpeg.org/doxygen/trunk/structAVAudioFifo.html [6] AVPacket: http://ffmpeg.org/doxygen/trunk/structAVPacket.html [7] AVFrame: http://ffmpeg.org/doxygen/trunk/structAVFrame.html [8] AVCodecContext: http://ffmpeg.org/doxygen/trunk/structAVCodecContext.html [9] av_audio_fifo_alloc: http://ffmpeg.org/doxygen/trunk/group__lavu__audiofifo.html#ga9d792394f0615a329aec47847f8f8784 [10] av_audio_fifo_write: http://ffmpeg.org/doxygen/trunk/group__lavu__audiofifo.html#ga0e7fadeea09c52a96eb4082a9e169bb4 [11] av_audio_fifo_size: http://ffmpeg.org/doxygen/trunk/group__lavu__audiofifo.html#gaa0a4742ecac52a999e8b4478d27f3b9b [12] av_audio_fifo_read: http://ffmpeg.org/doxygen/trunk/group__lavu__audiofifo.html#gab256fc29188d91311bd2fbd78eb356af [13] emcc: https://emscripten.org/docs/tools_reference/emcc.html?highlight=modularize [14] emcc options: https://github.com/emscripten-core/emscripten/blob/main/src/settings.js
ByteFE
Cutting‑edge tech, article sharing, and practical insights from the ByteDance frontend team.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.