Mastering Android Video Encoding: Choosing Codecs and Optimizing YUV Processing
This article examines Android video encoding challenges, compares hardware (MediaCodec) and software (FFmpeg+x264/openh264) codecs, and presents fast YUV frame preprocessing techniques—including scaling, rotation, and mirroring—using NEON optimizations to achieve high‑quality, low‑latency recordings.
Android video development is one of the most fragmented and compatibility‑problematic areas in the Android ecosystem, especially the camera and video‑encoding APIs, which many consider among the hardest to use.
Video Encoder Selection
Most apps that record video need to process each frame individually, so they rarely use
MediaRecorder. The two common choices are:
MediaCodec
FFmpeg + x264/openh264
MediaCodec
MediaCodec, introduced in API 16, provides low‑level access to hardware‑accelerated video encoding. After initializing MediaCodec as an encoder, raw YUV frames are fed into the input queue and encoded H.264 streams are retrieved from the output queue.
The API uses two queues: an input queue for raw YUV data and an output queue for the encoded H.264 stream. Both synchronous and asynchronous (callback) modes are available, with callbacks introduced in API 21.
Typical usage (synchronous) involves:
Calling
getInputBuffersto obtain the input queue.
Using
dequeueInputBufferto get a free buffer index.
Submitting raw YUV data with
queueInputBuffer.
Retrieving encoded data via
getOutputBuffersand
dequeueOutputBuffer.
Releasing the output buffer with
releaseOutputBuffer.
More complex examples can be found in the Android CTS test
EncodeDecodeTest.java.
Common issues with MediaCodec:
Color format compatibility : The encoder must be configured with a supported YUV color format. Many devices only support YUV420P, while the camera often outputs NV21. Developers must query supported formats via
codecInfo.getCapabilitiesForTypeand convert NV21 to YUV420P when necessary.
Limited encoder feature support : Settings such as Profile (baseline, main, high), Level, and Bitrate mode (CBR, CQ, VBR) are often ignored on most devices, especially below Android 7.0, leading to lower video quality.
16‑pixel alignment requirement : H.264 macroblocks are 16×16. Encoding video dimensions that are not multiples of 16 (e.g., 960×540) can cause visual artifacts on some SoCs. Aligning width and height to 16‑pixel boundaries solves the problem.
FFmpeg + x264/openh264
Software encoding using FFmpeg for preprocessing and x264/openh264 for encoding is another popular approach.
x264 is widely regarded as the fastest commercial H.264 encoder with full feature support. openh264, released by Cisco, is free to use but supports fewer advanced features (only baseline profile, level 5.2, slice‑based multithreading).
Hardware vs. Software Encoding
Hardware encoding (MediaCodec) offers speed and no external dependencies but has limited feature support and lower compression efficiency. Software encoding (FFmpeg + x264) is slower but provides higher quality and richer H.264 features. On Android 4.4+, MediaCodec is generally reliable, but capabilities vary across devices, so choosing the encoder based on device performance is recommended.
YUV Frame Pre‑Processing
Before feeding frames to the encoder, YUV data often requires scaling, rotation, and mirroring.
Scaling
When the camera preview is 1080p but the target video is 960×540, real‑time scaling is needed. The common method uses FFmpeg's
sws_scalewith the
SWS_FAST_BILINEARalgorithm, but on devices like Nexus 6P this can take >40 ms per frame, limiting frame rate.
To achieve sub‑5 ms scaling, a custom "local mean" algorithm implemented with NEON instructions was adopted. This method processes four neighboring pixels to compute each output pixel, dramatically reducing processing time while maintaining a PSNR of 38‑40 dB.
Rotation
Camera frames may be rotated 90° or 270° depending on device orientation. Instead of rotating YUV data (which can cost >30 ms per frame), the rotation matrix can be written into the MP4
moov.trak.tkhdbox using FFmpeg, allowing the player to handle rotation without extra CPU work.
Mirroring
Front‑camera footage is often mirrored. Since the raw YUV frame is already horizontally flipped, mirroring can be performed by swapping rows around the vertical center, handling Y and UV planes separately. A NEON implementation processes a 1080×1920 frame in under 5 ms.
Final Muxing
After encoding the H.264 video stream, the audio and video streams are multiplexed into an MP4 container using
MediaMuxer, mp4v2, or FFmpeg.
References
Leixiaohua's blog: http://blog.csdn.net/leixiaohua1020
Android MediaCodec resources: http://bigflake.com/mediacodec/
Coding for NEON tutorial: https://community.arm.com/processors/b/blog/posts/coding-for-neon---part-1-load-and-stores
libyuv library: https://chromium.googlesource.com/libyuv/libyuv/
WeChat Client Technology Team
Official account of the WeChat mobile client development team, sharing development experience, cutting‑edge tech, and little‑known stories across Android, iOS, macOS, Windows Phone, and Windows.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.