Mobile Development 23 min read

Using Native Android and iOS APIs to Add Transition Effects and Encode Images into Video

This article explains the fundamentals of audio‑video concepts and demonstrates how to use Android's MediaCodec, EGL, MediaMuxer and iOS's AVAssetWriter, GPUImage, and related native APIs to render OpenGL transition effects on images and combine them with audio into a final video file.

Ctrip Technology
Ctrip Technology
Ctrip Technology
Using Native Android and iOS APIs to Add Transition Effects and Encode Images into Video

In recent years short‑video apps have become extremely popular, prompting developers to create tools that add cool transition effects to images and compose them into videos. The article first introduces basic audio‑video concepts such as frames, frame rate, resolution, bitrate, and color spaces (RGB and YUV).

It then covers audio basics, including PCM, sampling rate, bit depth, channel count, and audio bitrate calculations, and briefly describes common audio codecs (AAC, WAV, MP3, OGG, APE, FLAC).

The video encoding section explains the principle of inter‑frame compression and lists common video codecs (H26x, MPEG) with a focus on H.264, highlighting its low bitrate, high quality, and network adaptability.

Android implementation

To add transition effects and encode images into a video on Android, the workflow uses OpenGL for rendering, a custom transition shader, MediaCodec for hardware encoding, EGL for bridging OpenGL and the codec surface, MediaMuxer for container multiplexing, and MediaExtractor for extracting audio tracks.

val width = 720
val height = 1280
val bitrate = 5000
val encodeType = "video/avc"
val mCodec = MediaCodec.createEncoderByType(encodeType)
val outputFormat = MediaFormat.createVideoFormat(encodeType, width, height)
outputFormat.setInteger(MediaFormat.KEY_BIT_RATE, bitrate)
outputFormat.setInteger(MediaFormat.KEY_FRAME_RATE, DEFAULT_ENCODE_FRAME_RATE)
outputFormat.setInteger(MediaFormat.KEY_I_FRAME_INTERVAL, 1)
outputFormat.setInteger(MediaFormat.KEY_COLOR_FORMAT, MediaCodecInfo.CodecCapabilities.COLOR_FormatSurface)
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.LOLLIPOP) {
outputFormat.setInteger(MediaFormat.KEY_BITRATE_MODE, MediaCodecInfo.EncoderCapabilities.BITRATE_MODE_CQ)
}
codec.configure(outputFormat, null, null, MediaCodec.CONFIGURE_FLAG_ENCODE)
val mSurface = codec.createInputSurface()
mCodec.start()

EGL setup creates a display, initializes it, chooses a config, creates a context, and then creates a window surface that uses the MediaCodec input surface:

//1, create EGLDisplay
val mEGLDisplay = EGL14.eglGetDisplay(EGL14.EGL_DEFAULT_DISPLAY)
//2, initialize EGLDisplay
val version = IntArray(2)
EGL14.eglInitialize(mEGLDisplay, version, 0, version, 1)
//3, choose config and create context
val config :EGLConfig? = null
if (mEGLContext === EGL14.EGL_NO_CONTEXT) {
// ... configure attributes and create context ...
val context = EGL14.eglCreateContext(mEGLDisplay, config, sharedContext, intArrayOf(EGL14.EGL_CONTEXT_CLIENT_VERSION, 2, EGL14.EGL_NONE), 0)
}
fun createWindowSurface(surface: Any): EGLSurface {
val surfaceAttr = intArrayOf(EGL14.EGL_NONE)
val eglSurface = EGL14.eglCreateWindowSurface(mEGLDisplay, mEGLConfig, surface, surfaceAttr, 0)
if (eglSurface == null) throw RuntimeException("Surface was null")
return eglSurface
}

MediaMuxer combines the encoded video and audio streams into an MP4 file:

val mediaMuxer = MediaMuxer(path, MediaMuxer.OutputFormat.MUXER_OUTPUT_MPEG_4)
mediaMuxer.addTrack(MediaFormat(...))
muxer.writeSampleData(currentTrackIndex, inputBuffer, bufferInfo)

MediaExtractor can be used to pull audio samples from an existing track and feed them to MediaMuxer.

val mediaExtractor = MediaExtractor(...)
mediaExtractor.selectTrack(...)
val inputBuffer = ByteBuffer.allocate(...)
mediaExtractor.readSampleData(inputBuffer, 0)
mediaExtractor.advance()

iOS implementation

The iOS side relies on AVFoundation. AVAssetWriter creates the container, AVAssetWriterInput defines video and audio tracks, and AVAssetWriterInputPixelBufferAdaptor supplies pixel buffers generated by GPUImage filters.

AVAssetWriter *assetWriter = [[AVAssetWriter alloc] initWithURL:[NSURL fileURLWithPath:outFilePath] fileType:AVFileTypeMPEG4 error:&outError];
NSDictionary *videoSetDic = @{AVVideoCodecKey: AVVideoCodecTypeH264, AVVideoWidthKey: @(size.width), AVVideoHeightKey: @(size.height)};
AVAssetWriterInput *videoWriterInput = [[AVAssetWriterInput alloc] initWithMediaType:AVMediaTypeVideo outputSettings:videoSetDic];
[assetWriter addInput:videoWriterInput];
AVAssetWriterInputPixelBufferAdaptor *adaptor = [AVAssetWriterInputPixelBufferAdaptor assetWriterInputPixelBufferAdaptorWithAssetWriterInput:videoWriterInput pixelBufferAttributes:@{kCVPixelBufferPixelFormatTypeKey: @(kCVPixelFormatType_32BGRA)}];

Audio is read with AVAssetReader and written via a second AVAssetWriterInput configured for AAC.

AVAssetReader *assetReader = [[AVAssetReader alloc] initWithAsset:audioAsset error:&error];
AVAssetReaderTrackOutput *audioOutput = [AVAssetReaderTrackOutput assetReaderTrackOutputWithTrack:assetAudioTrack outputSettings:@{AVFormatIDKey: @(kAudioFormatLinearPCM)}];
AVAssetWriterInput *audioWriterInput = [AVAssetWriterInput assetWriterInputWithMediaType:AVMediaTypeAudio outputSettings:audioSettings];
[assetWriter addInput:audioWriterInput];

GPUImage is used to implement custom transition shaders. A custom filter subclass overrides the vertex and fragment shaders; the fragment shader mixes two textures based on a progress uniform:

precision highp float;
varying highp vec2 textureCoordinate;
uniform sampler2D imageTexture;
uniform sampler2D imageTexture2;
uniform mediump vec4 v4Param1;
float progress = v4Param1.x;
void main() {
vec4 color1 = texture2D(imageTexture, textureCoordinate);
vec4 color2 = texture2D(imageTexture2, textureCoordinate);
gl_FragColor = mix(color1, color2, step(1.0 - textureCoordinate.x, progress));
}

The filter renders to a texture, reads the pixels into a CVPixelBuffer, and appends them to the AVAssetWriterInputPixelBufferAdaptor:

CVPixelBufferRef pixel_buffer = NULL;
CVReturn status = CVPixelBufferPoolCreatePixelBuffer(NULL, [self.videoPixelBufferAdaptor pixelBufferPool], &pixel_buffer);
glReadPixels(0, 0, self.sizeOfFBO.width, self.sizeOfFBO.height, GL_RGBA, GL_UNSIGNED_BYTE, CVPixelBufferGetBaseAddress(pixel_buffer));
[self.videoPixelBufferAdaptor appendPixelBuffer:pixel_buffer withPresentationTime:frameTime];

By repeating this process for each pair of consecutive images, a seamless video with custom transition effects and synchronized audio is produced.

The article concludes that understanding audio‑video fundamentals and leveraging platform‑native APIs enables developers to build powerful media processing pipelines on both Android and iOS.

iOSAndroidOpenGLMediaCodecAVFoundationVideoEncoding
Ctrip Technology
Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.