Fundamentals 21 min read

Common Color Representation Methods and Image/Video Fundamentals

The article explains common color models such as grayscale, RGB and YUV, describes image fundamentals like resolution and aspect ratio, outlines typical storage formats (RGB, YUV420P, NV12/NV21) and their bit‑depth considerations, and introduces video basics including frame rate, compression stages and HDR mapping.

Bilibili Tech

Sep 20, 2022

Common Color Representation Methods and Image/Video Fundamentals

This article introduces the most common ways to represent colors, the basics of image sampling, and fundamental concepts of video processing.

1. Common color representation methods

1.1 Black‑white – A grayscale image can be described by a single value ranging from 0 (black) to 1 (white). Values between 0 and 1 represent various shades of gray.

1.2 RGB – Human eyes have three types of cone cells that are most sensitive to wavelengths around 560 nm (red), 530 nm (green) and 420 nm (blue). By recording the intensity of these three components, a color can be expressed as three numbers (R, G, B). Different standards (e.g., BT.601, BT.709, sRGB) define the exact chromaticities of the primaries and the white point.

1.3 YUV – YUV separates luminance (Y) from chrominance (U, V). Because the human visual system is more sensitive to luminance, Y can be stored with higher precision while U and V use less precision. The conversion formulas are:

Y = Kr * R + Kg * G + Kb * B
U = (B - Y) / (1 - Kb)
V = (R - Y) / (1 - Kr)

R, G, B and Y are in the range 0‑1; U, V are in –1‑1. Kr, Kg, Kb are constants defined by the chosen standard (e.g., BT.601, BT.709, BT.2020).

1.4 Optical‑electrical conversion (OETF/EOTF) – Describes how light intensity is mapped to electrical signals (OETF) and back (EOTF). For SDR video the reference is BT.1886; for HDR video the reference is BT.2100.

1.5 Sampling precision (bit depth) – The number of bits used to store a sample determines how many distinct levels can be represented (e.g., 8 bit → 256 levels, 10 bit → 1024 levels). Higher bit depth provides finer gradations, especially important for HDR content.

2. Image basics

2.1 Resolution – Usually expressed in pixels (width × height). The perceived detail depends on the pixel density of the display.

2.2 Aspect ratio – The ratio of width to height. It may differ from the pixel aspect ratio; for example, DVD video often uses 720 × 480 pixels with a display aspect ratio of either 4:3 or 16:9.

3. Storage formats (examples)

3.1 RGB – The most common format stores three 8‑bit values per pixel (R, G, B), i.e., 24 bits per pixel. Data are usually stored row‑by‑row:

RGB RGB RGB RGB
RGB RGB RGB RGB
RGB RGB RGB RGB
RGB RGB RGB RGB

Stride (row padding) may be added to align rows to a specific byte boundary.

3.2 YUV420P (I420) – Stores all Y samples first, then all U samples, then all V samples. For a 4 × 4 image the layout is:

YYYY
YYYY
YYYY
YYYY
UUUU
UUUU
UUUU
UUUU
VVVV
VVVV
VVVV
VVVV

In the 4:2:0 subsampling, U and V have half the horizontal and vertical resolution, so the actual stored data become:

YYYY
YYYY
YYYY
YYYY
UU
UU
VV
VV

3.3 NV12 / NV21 – Similar to YUV420P but the chroma planes are interleaved. For NV12 the UV plane is stored as UVUV…, while NV21 stores VU pairs:

NV12:
YYYY
YYYY
YYYY
YYYY
UVUV
UVUV

NV21:
YYYY
YYYY
YYYY
YYYY
VUVU
VUVU

Stride handling is also required for these formats.

4. Video fundamentals

4.1 Frame rate – Measured in frames per second (FPS). It determines how many images are shown each second.

4.2 Video compression – Reduces the amount of data needed to store or transmit video. Typical pipelines include:

Color conversion to YUV and chroma subsampling (e.g., 4:2:0).

Spatial compression (e.g., intra‑frame coding like JPEG, using DCT).

Temporal compression (inter‑frame coding) that stores motion vectors and residuals.

For example, a 1280 × 720 frame stored as YUV420P uses 12 bits per pixel. At 30 FPS this amounts to roughly 40 MB per second, which motivates the use of sophisticated codecs (H.264, H.265, AV1, etc.).

The article also briefly touches on lossless vs. lossy compression, the role of discrete cosine transform in JPEG, and basic concepts such as motion estimation and block copying for reducing temporal redundancy.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Image processing sampling RGB video compression yuv color representation

Written by

Bilibili Tech

Provides introductions and tutorials on Bilibili-related technologies.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.