Artificial Intelligence 28 min read

AI-Powered Face Swapping for the Spring Festival Gala: System Design and Deployment

The paper details the design and deployment of an AI‑driven face‑swap platform for the 2025 CCTV Spring Festival Gala, featuring a dual‑model SDXL pipeline with ControlNet and LoRA fine‑tuning, optimized preprocessing and GPU‑specific acceleration to achieve sub‑3‑second latency at over 10 k QPS, supporting scaling, throttling, and multi‑region load balancing, and ultimately serving ten million users and generating hundreds of millions of personalized gala images.

DaTaobao Tech

Feb 21, 2025

AI-Powered Face Swapping for the Spring Festival Gala: System Design and Deployment

This article presents a comprehensive technical case study of an AIGC (generative AI) system built for the 2025 CCTV Spring Festival Gala, enabling users to upload a personal photo and instantly appear as a digital performer in nine classic gala program categories.

Project Overview – The solution combines AI face‑swap, scene generation, and immersive role‑play to provide a zero‑threshold, high‑engagement experience for millions of users. The system generated billions of personalized images, achieving a user export rate of ~90% and sustaining tens of thousands of QPS during peak periods.

Algorithm Architecture – A dual‑model pipeline was constructed using an SDXL‑based text‑to‑image (T2I) model and an SDXL‑Inpaint model. Identity features are injected via a ControlNet branch and an IP‑Adapter that merges facial embeddings with textual prompts. LoRA fine‑tuning improves facial realism and aesthetic quality. The training workflow includes staged learning: (a) facial feature guidance, (b) initial ControlNet adaptation, (c) similarity‑driven refinement with ArcFace loss, and (d) LoRA‑based portrait quality enhancement.

Pre‑processing and Front‑end Services – Uploaded images undergo parallel download, decoding, orientation correction, and multi‑scale face detection. The pipeline selects the best face based on size, angle, and similarity, then extracts gender, age (including minors), and a 1 × 1024 embedding for downstream processing.

Performance Optimizations – To meet a 3 s latency target under >10 k QPS, inference steps were reduced to 10 diffusion steps. Model acceleration was performed per GPU type: Nvidia cards (TRT + custom kernels) achieved 1.1–1.7 s latency; Nvidia A100 reached 1.1 s; AMD MI308X (torch.compile) achieved 2.7 s; domestic inference cards were optimized via QKV fusion, reaching 1.2–1.3 s. The system maintains >99.99% SLA with token‑bucket rate limiting and multi‑region load balancing.

System Architecture – A gateway routes requests to a message queue; workers pull tasks, apply token‑bucket throttling, and invoke the appropriate GPU‑specific inference service. Horizontal scaling and disaster‑domain isolation are achieved by deploying separate clusters per GPU family. Comprehensive stress testing validated end‑to‑end latency and reliability.

Results and Impact – Within one week of launch, the service served over ten million users, generated hundreds of millions of images, and maintained sub‑3 s response times. Social platforms reported massive user‑generated content, demonstrating the cultural and commercial success of the AI‑driven interactive experience.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Model Optimization AIGC AI Engineering face swapping large‑scale inference Spring Festival Gala

Written by

DaTaobao Tech

Official account of DaTaobao Technology

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.