Mastering Android Video Encoding: Choosing Encoders and Optimizing YUV Processing
This article examines Android video recording challenges, compares hardware (MediaCodec) and software (FFmpeg + x264/openh264) encoders, highlights device‑specific pitfalls such as color‑format support and alignment, and presents fast NEON‑based algorithms for scaling, rotation, and mirroring of YUV frames.
Android video development is one of the most fragmented and compatibility‑problematic parts of the Android ecosystem. Recording a 540p MP4 typically follows the flow: camera outputs YUV frames → preprocessing (scaling, rotation, mirroring) → encoder → H.264 stream, then audio is recorded separately and finally multiplexed.
Choosing an Encoder
Two main options are used:
MediaCodec – hardware‑accelerated API introduced in API 16. It requires initializing a MediaCodec encoder, feeding raw YUV buffers via the input queue and retrieving encoded H.264 buffers from the output queue. It supports synchronous and asynchronous (callback) modes.
FFmpeg + x264/openh264 – software encoding. FFmpeg handles frame preprocessing, while x264 (or Cisco’s openh264) performs H.264 encoding.
MediaCodec pitfalls
Color‑format support varies by device. The encoder must be configured with a MediaFormat that specifies width, height, frame rate, bitrate, I‑frame interval, and the YUV color format. Many devices only accept NV21 or YUV420P; mismatched formats cause color distortion.
Feature support (profile, level, bitrate mode) is limited on most phones, especially below Android 7.0 where profiles are hard‑coded to Baseline, reducing compression efficiency.
Input dimensions must be 16‑pixel aligned; otherwise some SoCs produce corrupted video.
Software encoder characteristics
x264 offers the best performance and feature set among software encoders, while openh264 is free but supports only Baseline profile and limited multithreading.
YUV Frame Pre‑processing
Before encoding, YUV frames often need scaling, rotation, and mirroring.
Scaling – using FFmpeg’s sws_scale with SWS_FAST_BILINEAR is slow (≈40 ms per frame on Nexus 6P). A custom “local‑mean” algorithm implemented with NEON reduces scaling time to <5 ms and yields PSNR ≈ 38‑40 dB.
Rotation – rotating 960×540 YUV frames in pure C costs >30 ms per frame. Instead, the rotation matrix can be stored in the MP4 moov.trak.tkhd box, letting the player handle orientation.
Mirroring – front‑camera frames are horizontally flipped. A simple NEON‑based mirror that swaps rows for Y and UV planes processes a 1080×1920 frame in <5 ms.
After encoding the H.264 stream, audio and video are multiplexed into an MP4 file using MediaMuxer, mp4v2, or FFmpeg.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
