Mobile Development 24 min read

How Meituan Engineered a Scalable Mobile Video Platform: Architecture and Lessons

This article details Meituan's end‑to‑end development of a merchant‑side mobile video feature, covering background needs, architecture design, technology selection, implementation of playback, recording, composition, cutting, processing pipelines, encountered pitfalls, monitoring strategies, and future optimization directions.

Meituan Technology Team
Meituan Technology Team
Meituan Technology Team
How Meituan Engineered a Scalable Mobile Video Platform: Architecture and Lessons

Background

Since 2013 Meituan Waimai’s rapid growth made static text and images insufficient for merchants. Introducing short product videos improved information richness, user attraction and order conversion, leading to a full‑stack video solution on the merchant side.

Solution Overview

The solution provides video capture, processing (mixing, filters, watermarks, animations) and composition. After launch the weekly video sample count and merchant usage grew sharply, achieving a 99.533 % recording success rate, 98.818 % processing success rate and a crash rate of 0.1‰. The feature is now deployed in multiple Meituan apps.

Architecture

A four‑layer, platform‑centric architecture isolates reusable platform capabilities from business logic.

Platform Layer : Native Android APIs (Camera, OpenGL, MediaCodec, MediaMuxer) and third‑party libraries such as ijkplayer and mp4parser.

Core Capability Layer : Audio/video encode‑decode, transcoding engines and filter rendering.

Base Component Layer : Reusable components for playback, trimming, screen recording and customizable UI panels.

Business Layer : Segment capture, free‑form shooting, video space management and template preview.

Key Technical Decisions

Solution Selection

Major cloud VOD SDKs (Alibaba Cloud, Tencent Cloud) were stable but >15 MB and costly. The internal UGC pipeline lacked cropping and aspect‑ratio support. Open‑source projects (Grafika, CTS) missed required features or performance. By combining strengths of Grafika, CTS and internal experience a custom lightweight solution was built.

Video Format Choice

H.264 was chosen for its mature ecosystem and high compression efficiency; AAC was selected for audio because of its superior quality‑to‑bitrate ratio.

Implementation Details

Playback

Android’s native MediaPlayer was discarded due to fragmentation and limited format support. ijkplayer (FFmpeg‑based) was adopted for cross‑platform capability, soft/hard‑decode switching and a familiar API. Progressive download is enabled by integrating AndroidVideoCache (https://github.com/danikula/AndroidVideoCache), which proxies network requests, caches files locally and allows simultaneous playback and caching. Cache strategy is configurable at runtime and resources are cleaned up on lifecycle events to avoid memory leaks.

Recording

Two pipelines were evaluated:

Camera + AudioRecord + MediaCodec + Surface

MediaRecorder + MediaCodec

Pipeline 2 was selected for lower risk. The full‑frame is recorded with MediaRecorder, then a coordinate‑based crop (handling front‑camera mirroring) and re‑encoding with MediaCodec are performed. Extensive device‑specific validation was required for models such as VIVO Y66 and Samsung devices. Example helper method:

public static double correctTimeToSyncSample(Track track, double cutHere, boolean next) { ... }

Composition

Segmented recordings are merged using mp4parser, which separates audio and video tracks, concatenates them and remuxes into an MP4 container. Certain MP4 boxes caused OOM crashes; therefore mp4parser is used only in controlled scenarios and a MediaCodec‑based composition fallback is applied when special boxes are detected.

Cutting

mp4parser’s key‑frame‑only trimming introduced second‑level precision errors that violated business rules (e.g., clips must be ≥3 s). A MediaCodec‑driven cutter operating on frame‑level timestamps was implemented, achieving microsecond precision. Core dequeue logic:

status_t BufferQueueProducer::dequeueBuffer(int *outSlot, sp<android::Fence> *outFence, uint32_t width, uint32_t height, PixelFormat format, uint32_t usage, FrameEventHistoryDelta* outTimestamps) { ... }

Processing Pipeline

Audio and video streams are demuxed, then processed in parallel. Video frames pass through OpenGL‑based filters, animations and watermark rendering before being encoded with hardware codecs (OpenMAX). The final streams are muxed into an MP4 file.

Common Pitfalls

Many hardware encoders require even width/height; the implementation enforces even dimensions.

Incorrect YUV color format leads to visual artifacts; the correct format is selected per device.

16‑pixel alignment is needed on devices such as Huawei and Samsung to avoid green borders.

BufferQueue capacity limits can stall frame‑available callbacks; the code guards against exceeding the max dequeued buffer count.

Constant‑quality (CQ) streams are incompatible with Android 9.0; the pipeline falls back to variable‑bitrate modes when CQ is unsupported.

Audio mixing challenges include mono‑to‑stereo conversion and volume balancing.

Monitoring and Reliability

Event logging, metric collection and full‑link tracing are instrumented throughout the video pipeline. Success‑rate thresholds (target 98 %, alarm 92 %) trigger alerts via internal messaging. Problematic device models are isolated by toggling a feature flag, and hot‑fixes are deployed when regressions are detected.

Future Work

Planned improvements include faster playback speeds, encoder efficiency optimizations, soft‑encoding fallbacks and tighter audio‑video joint processing. Ongoing research targets remaining fragmentation issues such as frame‑level green borders and codec‑specific bugs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performanceoptimizationarchitectureAndroidVideo processingMediaCodec
Meituan Technology Team
Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.