Fundamentals 7 min read

Why G711 and MPEG‑1 Audio Codecs Compress Sound Efficiently

This article explores the acoustic foundations of audio compression, detailing how human perception models enable non‑uniform quantization in G711 (PCMA/PCMU) and how MPEG‑1 layer‑III (MP3) leverages frequency‑domain subband analysis, masking effects, and psychoacoustic thresholds to achieve high compression ratios without sacrificing perceived audio quality.

Seewo Tech Circle

Sep 10, 2019

Why G711 and MPEG‑1 Audio Codecs Compress Sound Efficiently

To improve compression ratios while preserving audio quality, various audio codecs apply models of human speech production and perception. This article introduces several audio encoders and the acoustic principles behind them.

1. G711 (PCMA/PCMU)

1.1 Non‑uniform Quantization

Human hearing can tolerate a dynamic range from 0 dB to 120 dB, a factor of one million, but cannot distinguish differences smaller than 1 dB. Consequently, the ear perceives only about 120 distinct loudness levels, which are logarithmically distributed across the amplitude range. By exploiting this non‑linear relationship, codecs can use non‑uniform quantization: uniform quantization would require 12 bits per sample for telephone quality, whereas non‑uniform quantization reduces this to 8 bits.

1.2 G711 Implementation

PCMA uses A‑law and PCMU uses μ‑law to achieve the non‑uniform quantization. The two curves are almost identical: small‑amplitude signals receive finer quantization steps, while large‑amplitude signals receive coarser steps because the ear cannot detect slight changes at high levels.

2. MPEG‑1 Audio Coding

2.1 Hearing Threshold

The minimum audible sound is defined as the hearing threshold, while sounds that cause pain are defined as the pain threshold. The threshold varies with frequency, being most sensitive around 3–3.5 kHz and least sensitive at very low (≈20 Hz) and very high (≈20 kHz) frequencies.

2.2 Masking Effect

Masking occurs when a sound becomes inaudible in the presence of another louder sound. For example, an announcement in an airport may be masked by background noise. In the frequency domain, a masker at 250 Hz with 66 dB SPL raises the hearing threshold so that a weaker signal at 160 Hz and 39 dB SPL becomes imperceptible.

Temporal masking also exists: a brief sound can render preceding or following sounds inaudible, causing the sound pressure level on both sides to decay exponentially.

2.3 MPEG‑1 Audio Encoder

MPEG‑1 is an ISO audio coding standard with three layers; layer III corresponds to the MP3 encoder. It is a frequency‑domain encoder that splits the signal into 32 sub‑bands using a filter bank. Each sub‑band has equal bandwidth (e.g., for a 44.1 kHz sample rate, each band spans ~689 Hz). After sub‑band analysis, a short‑time Fourier transform converts the signal to the frequency domain, where a psychoacoustic model computes masking thresholds.

The global hearing threshold guides quantization: spectral components below the threshold can be set to zero without affecting perceived quality. By ensuring quantization noise stays within the global threshold, a codec can select the largest possible quantization step, maximizing compression while maintaining audio fidelity.

References

[1] Steven W. Smith, Digital Signal Processing .

[2] Lawrence R. Rabiner, Ronald W. Schafer, Theory and Applications of Digital Speech Processing .

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Audio Compression signal processing G711 MPEG-1 psychoacoustics

Written by

Seewo Tech Circle

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.