Fundamentals 6 min read

Understanding Video Transmission: GOP, Frame Types, and DTS/PTS Explained

Video transmission relies on compressing frames into groups (GOP) composed of I, P, and B frames, with decoding and presentation timestamps (DTS/PTS) coordinating playback order, ensuring efficient storage and smooth streaming despite differing frame dependencies.

Tencent IMWeb Frontend Team

Jan 24, 2022

Understanding Video Transmission: GOP, Frame Types, and DTS/PTS Explained

Video Transmission Principles

Video consists of a sequence of image frames and an audio track; playback is the sequential display of frames over time. To reduce size, not every frame is stored in full; instead, video streams are compressed (encoded) before transmission or storage.

Encoders group multiple images into GOP (Group of Pictures), which the decoder reads and renders frame by frame. A GOP is a continuous set of pictures containing one I‑frame and several B/P frames, and this pattern repeats until the video ends.

Depending on compression methods, frames are classified as I‑frames (intra‑coded or key frames), P‑frames (predictive forward frames), and B‑frames (bidirectional predicted frames). I‑frames contain a complete picture; P‑ and B‑frames store only changes relative to reference frames. Without I‑frames, P‑ and B‑frames cannot be decoded.

I‑frame

I‑frames use intra‑frame coding, relying only on spatial correlation within the frame, making them random‑access entry points and the reference for decoding. They appear periodically, and loss of an I‑frame can cause subsequent frames to become undecodable, resulting in black screens or stutter.

P‑frame

P‑frames employ inter‑frame coding, using both spatial and temporal correlation. They contain only the difference from the previous reference frame, improving compression efficiency. If a P‑frame is lost, visual artifacts such as blockiness appear.

B‑frame

B‑frames use bidirectional prediction, referencing both past and future frames, which greatly increases compression. In MPEG‑2 streams, the transmission order differs from the display order, requiring reordering during playback.

Because B‑frames depend on future I/P frames, they cannot be decoded immediately upon arrival; they must wait for their reference frames, leading to a mismatch between decoding and presentation times. This introduces the concepts of DTS (Decoding Time Stamp) and PTS (Presentation Time Stamp).

DTS and PTS

DTS indicates when a decoder should decode a frame, while PTS indicates when the frame should be presented to the viewer. Both timestamps are generated by the encoder.

For example, a GOP with capture order I B B B P has the following PTS order:

I B B B P
PTS: 12345

The encoder’s encoding order is:

I P B B B
DTS: 12345
PTS: 15234

Streaming follows the encoding order (I P B B B). The receiver gets the stream in the same order and decodes frame by frame:

Decoding order: I P B B B

Since decoding order differs from display order, frames must be reordered according to PTS:

I B B B P
PTS: 12345
DTS: 13452

Audio streams also use DTS and PTS, but without B‑frames, their decoding and presentation order are identical.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Streaming GOP DTS Video Encoding B-frame I-frame P-frame PTS

Written by

Tencent IMWeb Frontend Team

IMWeb Frontend Community gathering frontend development enthusiasts. Follow us for refined live courses by top experts, cutting‑edge technical posts, and to sharpen your frontend skills.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.