Understanding Video Transmission: GOP, Frame Types, and DTS/PTS Explained
Video transmission relies on compressing frames into groups (GOP) composed of I, P, and B frames, with decoding and presentation timestamps (DTS/PTS) coordinating playback order, ensuring efficient storage and smooth streaming despite differing frame dependencies.
Video Transmission Principles
Video consists of a sequence of image frames and an audio track; playback is the sequential display of frames over time. To reduce size, not every frame is stored in full; instead, video streams are compressed (encoded) before transmission or storage.
Encoders group multiple images into GOP (Group of Pictures), which the decoder reads and renders frame by frame. A GOP is a continuous set of pictures containing one I‑frame and several B/P frames, and this pattern repeats until the video ends.
Depending on compression methods, frames are classified as I‑frames (intra‑coded or key frames), P‑frames (predictive forward frames), and B‑frames (bidirectional predicted frames). I‑frames contain a complete picture; P‑ and B‑frames store only changes relative to reference frames. Without I‑frames, P‑ and B‑frames cannot be decoded.
I‑frame
I‑frames use intra‑frame coding, relying only on spatial correlation within the frame, making them random‑access entry points and the reference for decoding. They appear periodically, and loss of an I‑frame can cause subsequent frames to become undecodable, resulting in black screens or stutter.
P‑frame
P‑frames employ inter‑frame coding, using both spatial and temporal correlation. They contain only the difference from the previous reference frame, improving compression efficiency. If a P‑frame is lost, visual artifacts such as blockiness appear.
B‑frame
B‑frames use bidirectional prediction, referencing both past and future frames, which greatly increases compression. In MPEG‑2 streams, the transmission order differs from the display order, requiring reordering during playback.
Because B‑frames depend on future I/P frames, they cannot be decoded immediately upon arrival; they must wait for their reference frames, leading to a mismatch between decoding and presentation times. This introduces the concepts of DTS (Decoding Time Stamp) and PTS (Presentation Time Stamp).
DTS and PTS
DTS indicates when a decoder should decode a frame, while PTS indicates when the frame should be presented to the viewer. Both timestamps are generated by the encoder.
For example, a GOP with capture order I B B B P has the following PTS order:
<code>I B B B P
PTS: 12345</code>The encoder’s encoding order is:
<code>I P B B B
DTS: 12345
PTS: 15234</code>Streaming follows the encoding order (I P B B B). The receiver gets the stream in the same order and decodes frame by frame:
Decoding order: I P B B B
Since decoding order differs from display order, frames must be reordered according to PTS:
<code>I B B B P
PTS: 12345
DTS: 13452</code>Audio streams also use DTS and PTS, but without B‑frames, their decoding and presentation order are identical.
Tencent IMWeb Frontend Team
IMWeb Frontend Community gathering frontend development enthusiasts. Follow us for refined live courses by top experts, cutting‑edge technical posts, and to sharpen your frontend skills.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.