Fundamentals 17 min read

How to Build High‑Performance Bass Boost and Voice Clarity Filters with FFmpeg

This article explains the architecture, key techniques, and implementation details of audio post‑processing in a media player, covering bass‑enhancement and voice‑clarity filters, frequency‑range design, device constraints, FFmpeg filter chains, and sample code for a high‑performance, low‑latency solution.

Baidu App Technology

Jul 29, 2025

How to Build High‑Performance Bass Boost and Voice Clarity Filters with FFmpeg

1. Introduction

In modern player architectures, audio post‑processing is a core component for creating differentiated listening experiences across scenarios such as phone speakers, headphones, and TV sound systems. This series introduces the technical solutions and engineering implementation of audio post‑processing modules, primarily targeting audio‑video developers.

2. What Is Audio Post‑Processing?

Audio post‑processing refers to the digital signal processing applied after audio decoding (PCM/DSD) to enhance quality, add effects, or correct defects. The main goals are to improve perceived sound quality and adapt to various playback environments.

2.1 Technical Classification

Low‑frequency boost (bass enhancement)

Voice clarity (dialogue enhancement)

Stereo widening, noise reduction, etc.

3. Player Core Audio Post‑Processing Framework

The audio post‑processing module sits between the decoder and the audio frame queue in the playback pipeline. It is linked only when a task explicitly requires post‑processing, keeping latency and power consumption under control.

The framework supports multiple effects such as volume boost, clear voice, heavy bass, surround, and noise reduction, allowing dynamic enable/disable or parameter changes at runtime.

4. Effect 1 – Heavy Bass

4.1 Definition

Heavy bass enhances frequencies roughly between 20 Hz and 250 Hz, making drums, bass guitars, and explosions more powerful.

4.2 Frequency Ranges

Sub‑bass (20‑60 Hz): deep rumble

Bass (60‑250 Hz): body and weight

Low‑mid (250‑500 Hz): warmth

Midrange (500‑2000 Hz): core of vocals and instruments

Upper midrange (2‑4 kHz): clarity

Treble (4‑6 kHz): brightness

Brilliance (6‑20 kHz): air and detail

4.3 Device Limitations

Small speakers in phones and earbuds cannot reproduce very low frequencies effectively, unlike large cinema sound systems that can generate 20 Hz tones.

Speaker size : limits low‑frequency wave generation.

Power output : limited amplifier power reduces bass impact.

4.4 Implementation Options

4.4.1 EQ‑Based Low‑Frequency Adjustment

Increase gain in the 20‑200 Hz band (e.g., +3 dB to +6 dB) to boost bass without adding distortion.

4.4.2 Harmonic Generation

Generate odd‑order harmonics using a soft‑clip polynomial (shaped = x − x³/6) and anti‑aliasing to enhance perceived bass.

// Crossover coefficient calculation (design_crossover)
const double w0 = 2.0 * M_PI * s->cutoff / s->sample_rate;
const double alpha = sin(w0) / (2.0 * sqrt(2.0)); // Butterworth Q
// Low‑pass and high‑pass coefficients are generated by spectral inversion

4.4.3 Pre‑EQ Module

const double A = pow(10.0, s->pre_gain / 40.0);
const double omega = 2 * M_PI * s->cutoff * 0.8 / s->sample_rate;
// Low‑shelf filter coefficient calculation (includes sqrt(A))

4.4.4 Harmonic Generator

double generate_harmonics(double input, double drive) {
    double x = input * drive;
    // ... non‑linear processing
}

4.4.5 Post‑EQ Module

const double A = pow(10.0, s->post_gain / 20.0);
const double omega = 2 * M_PI * s->cutoff * 1.2 / s->sample_rate;
// High‑shelf filter coefficient calculation (simplified)

4.4.6 Dynamic Compressor

double compressor_process(...) {
    // Attack/Release coefficient calculation
    const double coeff = (input*input > envelope) ? attack_coeff : release_coeff;
    // Gain smoothing
    s->ch_state[ch].gain = 0.2 * old_gain + 0.8 * target_gain;
}

5. Effect 2 – Clear Voice

Clear‑voice processing combines band‑enhancement, voice masking, and background‑noise suppression to improve dialogue intelligibility in noisy environments.

5.1 DialogueEnhance Filter

The FFmpeg dialoguenhance filter converts stereo input into a 3‑channel output, strengthening the center channel where dialogue resides.

Options: original, enhance, voice.

5.2 Implementation Details

The filter’s core function filter_frame follows these steps: receive frame, windowing, FFT, algorithm processing, IFFT, stereo reconstruction, and output.

static int filter_frame(AVFilterLink *inlink, AVFrame *in) {
    AVFilterContext *ctx = inlink->dst;
    AVFilterLink *outlink = ctx->outputs[0];
    AudioDialogueEnhanceContext *s = ctx->priv;
    AVFrame *out;
    int ret;

    out = ff_get_audio_buffer(outlink, s->overlap);
    if (!out) { ret = AVERROR(ENOMEM); goto fail; }

    s->in = in;
    s->de_stereo(ctx, out);
    av_frame_copy_props(out, in);
    out->nb_samples = in->nb_samples;
    ret = ff_filter_frame(outlink, out);
fail:
    av_frame_free(&in);
    s->in = NULL;
    return ret < 0 ? ret : 0;
}

Integration into the player core involves initializing the filter and inserting it into the audio filter chain.

#include "dialoguenhance_filter.h"
AVFilterContext* DialoguenhanceFilter::getAVFilterContext() { return _avFilterContext; }
int DialoguenhanceFilter::initFilter(AVFilterGraph* graph, AudioBaseInfo* info, const JsonUtils::Value* param) {
    _avFilter = (AVFilter*)avfilter_get_by_name("dialoguenhance");
    if (!_avFilter) { LOGD("dialoguenhance filter not found"); return -1; }
    _avFilterContext = avfilter_graph_alloc_filter(graph, _avFilter, "dialoguenhance");
    if (!_avFilterContext) { LOGD("dialoguenhance filter context alloc failed"); return -1; }
    av_opt_set_double(_avFilterContext, "original", 0, AV_OPT_SEARCH_CHILDREN);
    av_opt_set_double(_avFilterContext, "enhance", 2, AV_OPT_SEARCH_CHILDREN);
    av_opt_set_double(_avFilterContext, "voice", 16, AV_OPT_SEARCH_CHILDREN);
    int result = avfilter_init_str(_avFilterContext, nullptr);
    if (result < 0) { LOGD("dialoguenhance filter init failed"); return -1; }
    return 0;
}

6. Summary

The article presented two typical audio post‑processing effects—heavy bass and clear voice—detailing their signal‑flow, key parameters, and integration into a playback pipeline using FFmpeg. Both techniques are widely applicable to most content scenarios and lay a foundation for further audio‑processing innovations.

ffmpeg audio processing digital signal processing bass boost voice enhancement

Written by

Baidu App Technology

Official Baidu App Tech Account

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.