Fundamentals 14 min read

Understanding CBR and VBR Encoding in MP3 Files

The article explains that MP3 files can use Constant Bit‑Rate (CBR) for simple, uniform frame sizes or Variable Bit‑Rate (VBR) for size‑efficient quality, detailing how VBR’s Xing header provides frame counts, file length, and a TOC to compute exact duration and enable accurate seek operations despite added complexity.

Tencent Music Tech Team
Tencent Music Tech Team
Tencent Music Tech Team
Understanding CBR and VBR Encoding in MP3 Files

MP3, the most common audio file format, can be encoded using two bitrate strategies: Constant Bit‑Rate (CBR) where every frame has the same bitrate, and Variable Bit‑Rate (VBR) where each frame may have a different bitrate. The choice between them affects how audio information is retrieved and how playback progress is controlled.

Basic Concepts

Bitrate (measured in kbps) indicates how many bits are transmitted per second. Higher bitrate generally means better audio quality but larger file size. Typical MP3 bitrates are 128 kbps (default), 192 kbps (common on the web), and up to 320 kbps for high‑quality audio.

In CBR encoding every frame occupies the same amount of data regardless of the actual audio content, which leads to wasted space for low‑frequency frames. VBR solves this by assigning a lower bitrate to low‑frequency frames and a higher bitrate to high‑frequency frames, reducing overall file size without sacrificing quality.

Another less‑used method is Average Bit‑Rate (ABR), which behaves similarly to CBR but allows occasional higher‑bitrate frames.

Drawbacks of VBR

Because VBR frames have variable sizes, calculating the total duration and performing seek operations become more complex than with CBR.

For CBR the duration can be computed directly:

时间长度(s)=(文件总长度(Byte)- id3字段总大小(如果存在))* 8 /(比特率(kbps) * 1000)

The seek offset for a target time is also straightforward:

文件位置(byte) = 目标时间位置(s)* 比特率(kbps)* 1000 / 8 + id3v2字段大小(如果存在)

VBR Solution: Xing Header

Most VBR MP3s use the Xing header (or the "Info" marker) placed in the first audio frame. The header stores auxiliary information such as total frame count, file length, a TOC (Table Of Contents) index, and a quality indicator.

Key fields include:

Position 0, length 4: the "Xing" or "Info" marker.

Flags indicating which optional fields are present (e.g., total frames, file length, TOC, quality).

Total frame count (Big‑Endian, 4 bytes).

File length in bytes (Big‑Endian, 4 bytes).

TOC: a 100‑byte array mapping time percentages to relative file positions.

Quality indicator: 0 (best) – 100 (worst).

To compute the audio duration, read the total frame count, multiply by the number of samples per frame (which depends on MPEG version and layer), then divide by the sampling rate.

7344 * 1152 = 8460288。

Assuming a sampling rate of 44.1 kHz:

8460288 / 44100 = 191 (s)

Thus, if the Xing header provides the total frame count, the exact duration can be derived.

Seek Operation Using the TOC

The TOC is a 100‑byte array where each entry represents the relative file position for a specific time point (each representing 1 % of the total duration). To seek to a target time, the algorithm is:

TOC[(60 / 240) * 100] = TOC[25]

and then compute the byte offset:

(TOC[25] / 256) * 5000000

If the exact time does not correspond to one of the 100 points, linear interpolation between the two nearest TOC entries is used.

The Android source implements this logic as follows:

bool XINGSeeker::getOffsetForTime(int64_t *timeUs, off64_t *pos) {
    if (mSizeBytes == 0 || !mTOCValid || mDurationUs < 0) {
        return false;
    }

    float percent = (float)(*timeUs) * 100 / mDurationUs;
    float fx;
    if (percent <= 0.0f) {
        fx = 0.0f;
    } else if (percent >= 100.0f) {
        fx = 256.0f;
    } else {
        int a = (int)percent;
        float fa, fb;
        if (a == 0) {
            fa = 0.0f;
        } else {
            fa = (float)mTOC[a-1];
        }
        if (a < 99) {
            fb = (float)mTOC[a];
        } else {
            fb = 256.0f;
        }
        fx = fa + (fb-fa)*(percent-a);
    }

    *pos = (int)((1.0f/256.0f)*fx*mSizeBytes) + mFirstFramePos;
    return true;
}

Conclusion

CBR is simple to decode and easy to calculate duration and seek offsets, while VBR achieves better storage efficiency at the cost of additional metadata handling (Xing header) and more complex duration/seek calculations. Understanding both formats is essential for developing robust MP3 playback functionality.

audio encodingbitrateCBRMP3SeekVBRXing Header
Tencent Music Tech Team
Written by

Tencent Music Tech Team

Public account of Tencent Music's development team, focusing on technology sharing and communication.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.