Mastering Android MediaCodec: From Basics to Advanced Video Processing
This article explores Android’s MediaCodec API, detailing its role in hardware video encoding/decoding, buffer management, data types, lifecycle states, and practical code examples, providing developers with a comprehensive guide to implementing advanced video processing features such as watermarking and transcoding on mobile devices.
In the booming era of AI and short video, the Mobile Security team experimented with AI camera and video products, accumulating technologies such as facial detection, tracking, gesture recognition, smart beauty, human segmentation, Android hardware video encoding/decoding, advanced OpenGL rendering, a self‑developed 3D engine, image processing, and iOS11 ARKit.
Facial detection, tracking, gesture recognition, smart beauty, human segmentation.
Android hardware video encoding/decoding.
Advanced OpenGL rendering.
Self‑developed 3D rendering engine.
Image and graphics processing.
iOS11 ARKit framework.
Senior tech staff @laoyu split the Android video processing hardware encoding/decoding into six articles; this piece focuses on MediaCodec.
Playing video on Android is simple, but processing such as adding watermarks or transcoding requires encoding/decoding. Android offers soft decoding (ffmpeg) and hardware decoding (MediaCodec). This article mainly examines MediaCodec.
What Is MediaCodec
Since API 16 (Android 4.1), Android provides the MediaCodec class so developers can flexibly handle audio‑video encoding and decoding. MediaCodec accesses the low‑level media codec framework (StageFright or OpenMAX) and is typically used together with MediaExtractor, MediaSync, MediaMuxer, MediaCrypto, MediaDrm, Image, Surface, and AudioTrack.
How It Works
The core of the workflow is the ByteBuffer. MediaCodec processes data via buffers: the user dequeues an empty input buffer (dequeueInputBuffer), fills it, queues it (queueInputBuffer) – this is the input stage. After processing, the user dequeues an output buffer (dequeueOutputBuffer), reads the data, and releases it (releaseOutputBuffer) for the next frame.
Data Types
Codec can handle three data types: compressed data (e.g., H264/H265 video, AAC audio), raw audio data, and raw video data. All can be processed with ByteBuffers, but raw video should use a Surface for better performance. Surface uses native video buffers, avoiding copies. When using Surface, raw video frames can be accessed via ImageReader; otherwise, ByteBuffers with the Image class can be used.
Compressed Buffer
Input buffers for decoders and output buffers for encoders contain compressed data defined by the media format. Video data consists of individual compressed frames; audio data consists of access units (small encoded audio chunks). Buffers align on frame or access‑unit boundaries, not arbitrary byte boundaries.
Raw Audio Buffer
Raw audio buffers contain PCM samples, each channel sample being a 16‑bit signed integer in native byte order.
short[] getSamplesForChannel(MediaCodec codec, int bufferId, int channelIx) {
ByteBuffer outputBuffer = codec.getOutputBuffer(bufferId);
MediaFormat format = codec.getOutputFormat(bufferId);
ShortBuffer samples = outputBuffer.order(ByteOrder.nativeOrder()).asShortBuffer();
int numChannels = format.getInteger(MediaFormat.KEY_CHANNEL_COUNT);
if (channelIx < 0 || channelIx >= numChannels) {
return null;
}
short[] res = new short[samples.remaining() / numChannels];
for (int i = 0; i < res.length; ++i) {
res[i] = samples.get(i * numChannels + channelIx);
}
return res;
}Raw Video Buffer
In ByteBuffer mode, video buffers are presented according to their color format. Supported color formats can be queried via getCodecInfo().getCapabilitiesForType(...).colorFormats. Three categories exist:
native raw video format : marked with COLOR_FormatSurface, used with input or output Surface.
flexible YUV buffers (e.g., COLOR_FormatYUV420Flexible): usable with Surface or via getInput/OutputImage(int) in ByteBuffer mode.
specific formats : vendor‑specific formats, also accessible via getInput/OutputImage(int) on API 21.
Since Android 5.1 (API 21), all video codecs support flexible YUV420 buffers.
Lifecycle
A MediaCodec instance progresses through three theoretical states: Stopped, Executing, and Released. Stopped includes Uninitialized, Configured, and Error sub‑states. Executing includes Flushed, Running, and End‑of‑Stream sub‑states.
After creating a codec via a factory method, it is Uninitialized. Call configure(...) to move to Configured, then start() to enter Executing. You can then process data via buffer queues.
Executing starts in the Flushed sub‑state; after the first input buffer is dequeued, it moves to Running. When an input buffer marked with end‑of‑stream is queued, the codec enters End‑of‑Stream, after which no new input is accepted but output buffers continue until the end‑of‑stream flag is output. Calling flush() returns the codec to Flushed.
Calling stop() returns the codec to Uninitialized, allowing re‑configuration. When finished, call release() to free resources. In rare error cases, the codec may enter Error; reset() can bring it back to Uninitialized, otherwise release() moves it to Released.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Qizhuo Club
360 Mobile tech channel sharing practical experience and original insights from 360 Mobile Security and other teams across Android, iOS, big data, AI, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
