Cloud Computing 12 min read

Meitu's Cloud-Based Image Beautification and Large-Scale Video Processing Architecture

Meitu replaced on-device beautification and video processing with a cloud-native architecture that routes requests by region, uses a dedicated upload SDK for detailed monitoring, employs edge-computing, a configuration-driven plug-in framework and Kubernetes-based elastic scaling, enabling fast, reliable, globally-distributed image and video services.

Meitu Technology
Meitu Technology
Meitu Technology
Meitu's Cloud-Based Image Beautification and Large-Scale Video Processing Architecture

This article describes how Meitu builds a future‑oriented cloud processing system to handle massive image‑beautification and video‑processing workloads for global users.

Background and Motivation

Traditional on‑device processing in Meitu apps suffers from long development cycles, increasing client package size, limited performance on low‑end devices, and insufficient data for AI training. Improvements in 4G networks and user demand for fast, high‑quality beautification drive the shift to cloud processing.

Limitations of Local Processing

1. Long R&D chain – algorithms must be adapted, tested, and released to devices. 2. Client package bloat – each new algorithm inflates the app size, problematic for overseas markets. 3. Insufficient performance on low‑end devices. 4. Lack of AI training data from local images.

Cloud Image Beautification Architecture

The cloud solution must satisfy three basic requirements: high speed, high success rate, and comprehensive quality monitoring. The simplified workflow is: the client uploads an image to a storage service, calls an API service for processing, and the API synchronously contacts the processing service. A timeout triggers a polling mechanism.

Images are retained for six months for AI training and then deleted to address privacy concerns.

Global Deployment Challenges

Meitu’s products have large overseas user bases, so the cloud must handle cross‑region routing. Issues arise when storage is in a Chinese region but the processing service resolves to a Singapore DNS, causing failures. The solution adds a region‑specific field to storage and API requests, enabling precise routing and faster disaster‑recovery switching.

Comprehensive Quality Monitoring

Meitu uses a dedicated file‑upload SDK that reports metrics such as effect ID, upload time, processing time, download time, and failure points. The SDK also monitors upload latency, failure rate, and speed. API service monitoring tracks request latency and processing duration, providing detailed reports for optimization.

Edge Computing Direction

Future evolution moves processing to edge nodes: data is uploaded to a CDN edge, processed there, and the result is asynchronously stored back to the origin, reducing latency and improving user experience.

Large‑Scale Video Processing Architecture

Typical video processing tasks include watermarking, frame extraction, transcoding, and adding effects. The traditional approach ties each business requirement to a separate script, leading to tangled logic and poor scalability.

Problems

Increasing business demands cause chaotic script management, difficulty in controlling business logic, lack of workflow standardization, and inability to elastically schedule resources.

Solution

1. Shift decision‑making to the processing service (downstream decision). 2. Use a configuration‑driven command framework (Poseidon) with a template‑method pattern to encapsulate common steps (receive message, download, process, callback). 3. Implement a plug‑in system (Trident) where new functions are added as Python scripts and packaged into Docker images, orchestrated by Kubernetes for elastic scaling. 4. Deploy storage and processing services across regions, using region tags to route requests appropriately.

The architecture’s three key features are: decision‑making shift, workflow templating, and elastic scheduling via K8s.

Q&A

Question: Are the monitoring and APM solutions commercial or open‑source? How is APM impact on business performance evaluated?

Answer (Wang Jingbo): APM is a passive side‑channel; business code pushes metrics via HTTP. We also use a proprietary SDK called “Hubble” that reports dozens of metrics (TCP time, DNS time, success/failure, total processing time). This data helps us identify performance bottlenecks and develop optimizations such as a FastDNS cache.

Author Bio

Wang Jingbo, Technical Director at Meitu, joined in September 2015. He leads projects including cloud image beautification, media fusion scheduling, live‑comment system, feed service, and quality monitoring. Previously worked at NetEase and Sina Weibo with over ten years of backend development experience.

monitoringSystem Architecturecloud computingedge computingimage processingvideo processingMeitu
Meitu Technology
Written by

Meitu Technology

Curating Meitu's technical expertise, valuable case studies, and innovation insights. We deliver quality technical content to foster knowledge sharing between Meitu's tech team and outstanding developers worldwide.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.