Fundamentals of Audio and Video: Basics, Encoding, Processing, and Real‑Time Communication
This technical sharing session by a senior audio‑video engineer from 360 Video Cloud explains core concepts of video and audio, their encoding pipelines, media processing techniques, streaming protocols, and the challenges and key technologies behind real‑time communication (RTC).
This article presents a technical sharing session by a senior audio‑video engineer from 360 Video Cloud, covering fundamental concepts of audio and video, encoding techniques, media processing, streaming protocols, and real‑time communication (RTC) challenges and solutions.
1. Video Basics – Basic Concepts Video consists of image frames and accompanying sound. Applications include TV, VCD/Blu‑ray, and internet video such as VOD (e.g., Youku, iQIYI), short video (TikTok, Kuaishou), live streaming (Huajiao, YY) and real‑time calls (Skype, video conferencing). Each frame is a grid of pixels; pixels contain three color components. Resolution evolves toward higher definitions (4K, 8K). Compression is needed because raw video data (e.g., 1280×720×30×3 ≈ 80 MB/s) exceeds typical bandwidth and storage.
2. Video Compression – Why and How Compression reduces data size: H.264 achieves ~500×, H.265 ~1000× reduction. It exploits spatial redundancy (similar pixels) and temporal redundancy (similar frames). Most video compression is lossy, discarding information that is less perceptible. The encoding pipeline consists of prediction, transform (DCT), quantization, and entropy coding. Intra‑prediction handles spatial redundancy; inter‑prediction (motion estimation & compensation) handles temporal redundancy. DCT concentrates energy, making many high‑frequency coefficients near zero. Quantization introduces loss; entropy coding (e.g., CABAC) provides lossless compression. Frame types include I, P, B frames, organized into GOP structures.
3. Audio Basics – Basic Concepts Sound is a mechanical wave represented digitally as a one‑dimensional waveform. Volume is measured in decibels (dB), a logarithmic unit.
4. Audio Compression – Principles and Codecs Audio encoding uses frequency‑domain and time‑domain masking. Different scenarios demand different codecs: low‑latency codecs for real‑time calls, wide‑band codecs for music, etc. Common codecs include AAC, Opus, and MP3.
5. Media Processing Video: transcoding (adjust bitrate/resolution), visual effects (filters, stickers, transitions, beauty, collage, speed change), watermarks, picture‑in‑picture. Audio: pitch/tempo change, reverb, fade‑in/out.
6. Bitstream, Container, and Protocol Bitstream: raw encoded data output by the encoder, usually unsuitable for storage/transmission directly. Container: packages bitstreams for storage and transport (e.g., MP4, TS, FLV, AVI). Protocol: defines how containers are transmitted (e.g., RTMP, HLS, HTTP).
7. Video Production and Consumption Production involves recording and editing; consumption involves playback on various devices.
8. Comparison of VOD, Live, and Real‑Time Calls
9. Real‑Time Audio‑Video Communication (RTC) – Challenges and Key Technologies
Challenges: echo cancellation, packet loss, network jitter.
Key Technologies: • Echo cancellation (adaptive filters) • Audio signal processing (AGC, noise suppression, NLP) • Forward Error Correction (FEC) • Jitter buffer • Packet loss concealment • Multi‑Point Control Unit (MCU) • Scalable Video Coding (SVC)
360 Tech Engineering
Official tech channel of 360, building the most professional technology aggregation platform for the brand.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.