Mobile Development 15 min read

Mastering Android Video Encoding: Choosing Codecs and Optimizing YUV Processing

This article examines Android video encoding challenges, compares hardware (MediaCodec) and software (FFmpeg+x264/openh264) codecs, and presents fast YUV frame preprocessing techniques—including scaling, rotation, and mirroring—using NEON optimizations to achieve high‑quality, low‑latency recordings.

WeChat Client Technology Team
WeChat Client Technology Team
WeChat Client Technology Team
Mastering Android Video Encoding: Choosing Codecs and Optimizing YUV Processing

Android video development is one of the most fragmented and compatibility‑problematic areas in the Android ecosystem, especially the camera and video‑encoding APIs, which many consider among the hardest to use.

Video Encoder Selection

Most apps that record video need to process each frame individually, so they rarely use

MediaRecorder

. The two common choices are:

MediaCodec

FFmpeg + x264/openh264

MediaCodec

MediaCodec, introduced in API 16, provides low‑level access to hardware‑accelerated video encoding. After initializing MediaCodec as an encoder, raw YUV frames are fed into the input queue and encoded H.264 streams are retrieved from the output queue.

The API uses two queues: an input queue for raw YUV data and an output queue for the encoded H.264 stream. Both synchronous and asynchronous (callback) modes are available, with callbacks introduced in API 21.

Typical usage (synchronous) involves:

Calling

getInputBuffers

to obtain the input queue.

Using

dequeueInputBuffer

to get a free buffer index.

Submitting raw YUV data with

queueInputBuffer

.

Retrieving encoded data via

getOutputBuffers

and

dequeueOutputBuffer

.

Releasing the output buffer with

releaseOutputBuffer

.

More complex examples can be found in the Android CTS test

EncodeDecodeTest.java

.

Common issues with MediaCodec:

Color format compatibility : The encoder must be configured with a supported YUV color format. Many devices only support YUV420P, while the camera often outputs NV21. Developers must query supported formats via

codecInfo.getCapabilitiesForType

and convert NV21 to YUV420P when necessary.

Limited encoder feature support : Settings such as Profile (baseline, main, high), Level, and Bitrate mode (CBR, CQ, VBR) are often ignored on most devices, especially below Android 7.0, leading to lower video quality.

16‑pixel alignment requirement : H.264 macroblocks are 16×16. Encoding video dimensions that are not multiples of 16 (e.g., 960×540) can cause visual artifacts on some SoCs. Aligning width and height to 16‑pixel boundaries solves the problem.

FFmpeg + x264/openh264

Software encoding using FFmpeg for preprocessing and x264/openh264 for encoding is another popular approach.

x264 is widely regarded as the fastest commercial H.264 encoder with full feature support. openh264, released by Cisco, is free to use but supports fewer advanced features (only baseline profile, level 5.2, slice‑based multithreading).

Hardware vs. Software Encoding

Hardware encoding (MediaCodec) offers speed and no external dependencies but has limited feature support and lower compression efficiency. Software encoding (FFmpeg + x264) is slower but provides higher quality and richer H.264 features. On Android 4.4+, MediaCodec is generally reliable, but capabilities vary across devices, so choosing the encoder based on device performance is recommended.

YUV Frame Pre‑Processing

Before feeding frames to the encoder, YUV data often requires scaling, rotation, and mirroring.

Scaling

When the camera preview is 1080p but the target video is 960×540, real‑time scaling is needed. The common method uses FFmpeg's

sws_scale

with the

SWS_FAST_BILINEAR

algorithm, but on devices like Nexus 6P this can take >40 ms per frame, limiting frame rate.

To achieve sub‑5 ms scaling, a custom "local mean" algorithm implemented with NEON instructions was adopted. This method processes four neighboring pixels to compute each output pixel, dramatically reducing processing time while maintaining a PSNR of 38‑40 dB.

Rotation

Camera frames may be rotated 90° or 270° depending on device orientation. Instead of rotating YUV data (which can cost >30 ms per frame), the rotation matrix can be written into the MP4

moov.trak.tkhd

box using FFmpeg, allowing the player to handle rotation without extra CPU work.

Mirroring

Front‑camera footage is often mirrored. Since the raw YUV frame is already horizontally flipped, mirroring can be performed by swapping rows around the vertical center, handling Y and UV planes separately. A NEON implementation processes a 1080×1920 frame in under 5 ms.

Final Muxing

After encoding the H.264 video stream, the audio and video streams are multiplexed into an MP4 container using

MediaMuxer

, mp4v2, or FFmpeg.

References

Leixiaohua's blog: http://blog.csdn.net/leixiaohua1020

Android MediaCodec resources: http://bigflake.com/mediacodec/

Coding for NEON tutorial: https://community.arm.com/processors/b/blog/posts/coding-for-neon---part-1-load-and-stores

libyuv library: https://chromium.googlesource.com/libyuv/libyuv/

AndroidFFmpegVideo EncodingNEON OptimizationMediaCodecYUV Processing
WeChat Client Technology Team
Written by

WeChat Client Technology Team

Official account of the WeChat mobile client development team, sharing development experience, cutting‑edge tech, and little‑known stories across Android, iOS, macOS, Windows Phone, and Windows.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.