Mobile Development 11 min read

Understanding iOS Core Audio: Definitions of Sample, Frame, and Packet

The article clarifies Apple’s Core Audio terminology—defining a sample as a single channel value, a frame as simultaneous samples, and a packet as one or more contiguous frames—explains why these terms are often confused across audio, networking, and codec contexts, and demonstrates the definitions with an MP3 parsing example.

Tencent Music Tech Team
Tencent Music Tech Team
Tencent Music Tech Team
Understanding iOS Core Audio: Definitions of Sample, Frame, and Packet

When learning about iOS Core Audio, you will encounter concepts such as bitrate, sample, frame, and packet. Because the industry uses the terms "packet" and "frame" differently in various contexts, it is easy to get confused.

(1) iOS Core Audio Audio Concept Definitions

To discuss iOS Core Audio, you must follow Apple’s definitions of audio‑related concepts.

Apple’s API documentation defines the following:

A sample is a single numerical value for a single audio channel in an audio stream.

A frame is a collection of time‑coincident samples. For example, a linear PCM stereo file has two samples per frame, one for the left channel and one for the right channel.

A packet is a collection of one or more contiguous frames. A packet defines the smallest meaningful set of frames for a given audio data format and is the smallest data unit for which time can be measured. In linear PCM audio, a packet holds a single frame; in compressed formats it usually holds more, and in some formats the number of frames per packet varies.

In plain language:

Sample – one sample of one channel.

Frame – the set of samples that occur at the same instant (e.g., two samples for a stereo frame).

Packet – one or more frames grouped together; the number of frames per packet is determined by the file format (e.g., 1 frame per packet for PCM, 1152 frames per packet for MP3).

(2) Why These Concepts Are Easily Mixed Up

In everyday discussions the words frame and packet are used in many scenarios, each with a different meaning, which leads to confusion.

Examples:

When talking about MPEG formats, a "frame" often refers to a MPEG header + payload structure, while the same structure is called a "packet" in iOS Core Audio documentation.

During network transmission of audio, data is "packaged" into packets that have their own packet header, which is unrelated to the Core Audio packet definition.

In computer networking, a hardware data frame is wrapped into a network packet for higher‑layer use, again different from audio terminology.

FFmpeg defines two structures: AVPacket (compressed audio/video data) and AVFrame (decompressed data). Their usage aligns closely with the Core Audio definitions.

Because the Chinese word "帧" can refer to a data frame, an audio frame, a packet, or a network frame, the ambiguity becomes even greater.

(3) Demo: Audio Data Frames of a QQ Music Song

Using the definitions above, we clarify the Core Audio meaning of packet by analyzing an MP3 file from QQ Music (song "最长的电影").

When decoding with AudioFileStreamOpen , you must register two callbacks: AudioFileStream_PropertyListenerProc (for property data) and AudioFileStream_PacketsProc (for packet data).

We feed the file to AudioFileStreamParseBytes in 1000‑byte chunks. The first 496 bytes are properties; the next 417 bytes form the first packet (audio frame). The remaining 87 bytes are insufficient for another complete frame, so they are left unprocessed.

On the second 1000‑byte chunk, two packets are returned (each 418 bytes). The binary contents correspond to MP3 frame headers (e.g., fffb9044 0008024e… at file offset 496).

This demonstrates that the Core Audio AudioStreamPacketDescription packet corresponds to an MP3 audio data frame. The structure actually represents a data‑packet that contains one or more audio frames.

(4) Q&A

Q: What is the precise definition of a packet in iOS Core Audio? A: See section (1) above.

Q: Why do I often confuse packet and frame? A: See sections (2) and (3); the same terms have different meanings in different contexts.

Q: Can a packet contain half a frame? A: In iOS Core Audio, a packet always contains at least one whole frame (e.g., PCM: 1 packet = 1 frame). Network‑level packets may be defined differently, and some protocols could split frames, but that is outside Core Audio’s definition.

Q: How many data frames does AudioFileStreamParseBytes parse per call? Can it be controlled? A: For CBR formats, the number of frames returned roughly matches the number of bytes you feed, except for an incomplete final frame. For VBR formats, a single call may trigger multiple packet callbacks. See the AudioFileStream_PacketsProc API documentation for details.

References: 1. Apple AudioStreamBasicDescription API 2. MPEG File Format Introduction 3. MPEG Transport Stream 4. FFmpeg Official Site 5. FFmpeg AVPacket Documentation 6. FFmpeg AVFrame Documentation 7. AudioFileStream_PacketsProc API Documentation

(End of article)

iOSAudio ProcessingCore Audioframepacketsample
Tencent Music Tech Team
Written by

Tencent Music Tech Team

Public account of Tencent Music's development team, focusing on technology sharing and communication.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.