Kuaishou Y-tech AI SDK Framework: Secrets Behind Mass Production of Special Effects
The article details Kuaishou's Y-tech AI SDK (YKit) architecture, covering its design for computer vision capabilities, performance optimization strategies for mobile devices, and real-world case studies such as GAN-based effects and intelligent matting, outlining challenges and future directions.
Kuaishou Y-tech AI engineering team presented at GMTC 2021 about an end-side AI SDK framework that enables mass production of special effects.
Kuaishou’s platform averages 379 million daily active users and reaches 1 billion monthly active users worldwide; its Y-tech department develops computer vision, computer graphics, AR/VR and other AI capabilities that are deployed across all Kuaishou apps to empower intelligent creation.
The talk is organized into four parts: background, AI SDK architecture design, performance challenges, and future outlook.
Background showcases magic effects such as “Fairy Princess”, “Invisibility Cloak”, “Van Gogh Starry Sky”, and “All Things AR”, which rely on GAN, AR, facial keypoints, portrait and sky segmentation, etc.; the GAN‑based “Fairy Magic” launched in November 2020 achieved millions of works and billions of plays.
Other AI capabilities include intelligent matting (extracting portrait, head or sky from video clips), smart object recognition (scan‑and‑identify objects), and one‑click video generation from user albums.
Behind these capabilities lies a complete AI architecture from model training to on‑device inference.
The main challenges are: effect (algorithm reuse, configurability, visual perception, packaging), performance (memory usage, crash rate, package size especially on low‑end devices), and cost (development effort, QA cycles, integration efficiency, iteration speed).
YKit’s framework consists of three parts: core library, functional modules, and toolchain. The core library provides a unified call interface, a generic data interface (combining JSON and ProtoBuf advantages), an optimized graphics/image library for pre/post‑processing, a module factory for plugin‑style features, model management that delivers tiered models per hardware, data conversion utilities, an engine interface to the KwaiNN inference engine, and logging/reporting for usage and performance metrics.
Functional modules offer compile‑time switches and configuration, implementing secondary features within each category. The toolchain includes multi‑end demos, automated documentation via Doxygen from algorithm and demo code, a local material library for quick debugging, unit testing, and a packaging platform.
On the computation flow, YKit defines two paths: a CPU chain for flexible pre/post‑processing that can chain multiple功能 points, and a GPU chain that runs shader‑based post‑processing. Algorithm integration follows layers: the bottom‑level KwaiNN engine, the framework layer with module factories and macros, core modules (high‑frequency capabilities like facial keypoints bundled with YKit), and dynamic modules that can be delivered on demand, reducing package size by up to 50% on Android.
Performance optimization centers on a high‑efficiency graphics/image library with over 50 optimized operators supporting CPU, Neon, OpenGL, Metal backends, and a model tiered distribution platform that defaults to ten device tiers (iOS/Android high/medium/low, Huawei HiAI, iPhone CoreML, etc.), automatically selecting the best model for each hardware while leveraging KwaiNN’s CPU/GPU/NPU optimizations.
Three case studies illustrate the optimizations: (1) GAN‑type effects use a shared eight‑model structure covering a wide MAC range, three runtime modes to balance quality and fluency, and over twenty configurable parameters for rapid effect iteration; (2) a complex facial dynamic effect (key‑points → 3D reconstruction → rendering → GAN inference → second key‑point calculation) is parallelized across three CPU threads and one GPU thread by combining the capture SDK’s multithreading with YKit’s internal multithreading; (3) intelligent matting in the editing SDK adopts a CPU pre‑analysis + GPU rendering two‑stage design, supplemented by YKit’s inference cache interface that can cache final or intermediate results, with cache sizes tuned per device tier, data validation on restore, and automatic re‑inference for missing frames, yielding performance that surpasses competitors.
Future work aims to unify server and mobile ends (output effects and engineering code), continually improve performance for the diverse device base of Kuaishou’s users, boost development efficiency, and ultimately deliver better AI services that enhance users’ sense of happiness.
Kuaishou Tech
Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.